" /> Frank's Activity Log: October 2005 Archives

« September 2005 | Main | November 2005 »

October 31, 2005

31. Oktober 2005 -- Montag

Admin

  • DH Issue: 146GB Drive arrived, installed in DotHill SANnetII, failed drive returned via FedEx. Tracking #8388321927610215.
  • FP Issue=16876 Proj=4: No hung imapd found, IE on OS/X and webmail? -- has never worked with the new version of webmail :(

LDAP

  • "Student Employee" error (from Saturday morning run) handled.

WebCT

  • Renamed "whale" to be "carp"
  • Racked second application server (carp)

October 28, 2005

28. Oktober 2005 -- Freitag

Admin

  • FP Issue #16853: Hung imapd from 19 October.
  • Open issue with DotHill to get replacement 146GB drive for zoosan4 (one failed last night).
  • FP Issue #16858: Hung imapd from 12:23 this afternoon.
  • DNS: defined two permanent names in the campus.ad.uvm.edu domain for wins systems. (nsupdate mouse)
  • FP Issue #16859: Hung imapd from Oct 25.

FootPrints

  • Issue #56359
    • Responded to update from UniPress reiterating the problem in an attempt to correct the misunderstanding they were operating under in their response.
    • Bess found two agents (Hope and Justin) that are not working correctly. We called UniPress and talked with Linda. She had us upload a tar of the FootPrints etc directory for analysis.
  • Examining the install scripts for 7.0 -- need to modify the mrMySQLConfig script to use a unique database name (at least as a start).

SecurID

  • C0531558
    • Obtain the 6.1 install media, documentation, and instructions to upgrade our ACE/Server 5.2 configuration
    • Well, NO! 6.1 distribution has been discontinued due to a database corruption issue. Current recommendation by RSA is to stay with 5.2.
    • Case (and project) closed!

October 27, 2005

27. Oktober 2005 -- Donnerstag

Admin

  • New three year SSL certificate for giraffe webserver.
    • Installed new certificate files and modified httpd.conf file to specify correct ServerAdmin address
    • Will need to recycle the web server this evening to ensure that I didn't mess anything up.
  • FP Proj=4 Issue=16841 -- hung imapd.
  • /rack2e filesystem 97%.
    • Moved two users off to /rack3a.
    • One was large to get the pages to stop.
    • the second was the actual offender. However, when attempting to send him email, it bounced because he's over quota and he doesn't have a local phone number listed.

LDAP

  • installed new ldap-stats.pl script on all the servers

SecurID

  • Research process to upgrade from 5.2 to 6.1
  • Request 6.1 media from RSA

Calendar

  • Check Oracle site to see if Treo or Blackberry device is supported with the version of Calendar server we are using
  • Nope... Oracle specifically says the Treo 650 is not supported and only supports a single Blackberry model 6710. Carol says the connector didn't work well at all with the Blackberry she attempted to set up in August.

FootPrints

  • UniPress Issue #56359
    • Opened issue to see if we have data corruption or a local problem, because we renamed an unused project but can't get the project management page to show the correct list of agents
  • UniPress Issue #56360
    • Opened issue to see what kind of trouble I was going to get myself into attempting to install FP v7 on the same system pointed at the same remote MySQL server (in test mode) that the production FP v6.5c is using.
    • UniPress says I'm going to shoot my foot off because the name of the MySQL database is hardcoded and cannot be changed...
    • I believe this is a job for Sir Hacks-a-lot!

October 26, 2005

26. Oktober 2005 -- Mittwoch

Admin

  • Apache error on giraffe report forwarded from Ed -- asked Mike G. if it could be related to the work he was doing with upgrading Oracle on giraffe yesterday.
  • ~account path error on moose again. I think I have found the last of it and fixed it all this time.
  • Credit Card system seems to have had issues again last night. The restart doesn't seem to have worked last night, but it does appear it worked properly this morning when it found it was again broken. Requested John check that things were flowing correctly.

FootPrints

  • Apparently, I missed that UniPress isn't supporting version 6.5 anymore and now I need to plan the upgrade.
  • Worked with Stef to debug why she's unable to append to issues in project 19. The code is claiming that her email address is not an assignee -- which is true, but her netid is an assignee. She's working with Bess to see if there is anything radically different about this project versus ones that do work, then we'll see about talking with UniPress on the issue.

SecurID

  • Verified that the backup job ran appropriately (it took less than a minute).
  • Adjusted the script to not run in debug mode.
  • Set it to run at 1:10 AM every day
  • Set the "live" backup on the primary to run at 12:10 AM every day

LDAP

  • Corrected a "Student Employee" error
  • Continued discussions with Warren about LDAP to HRS feed being updated to generate a transaction to remove the netid from HRS when HRS stops sending the person's information to LDAP. Said it would take two weeks to design and implement the change.

Calendar

  • Received notification that the version of Oracle Calendar we are running will go off support on March 1, 2007.

October 25, 2005

25. Oktober 2005 -- Dienstag

Admin

  • Remove obsolete directory trees on Oracle DB server for DBA
  • Update diskgrowth.pl to use printf to make pretty format output

SecurID

  • Set up SecurID auth as default on replica server
  • Research and Develop script to make a nightly full backup of the database on the replica server with the processes down

Calendar

  • Install Update 6 for RHEL/AS 3.0 on test machine
  • Verify that the calendar server still starts

October 24, 2005

24. Oktober 2005 -- Montag

Accounts

  • /users/a/c/account symlink was changed Sunday afternoon (16:13) to point at the DFS path. Reverted it to the local path to stop the hourly errors from mkacct.
  • moveuser script rewritten in perl and facility to determine best filesystem to move the account to reimplemented.
  • Update the scripts that used /users/a/c/account and /users/n/a/nameserv to use the local filesystem paths instead and put the symlinks back the way they were (since Mike explained why the change was made and I agree with it).

Calendar

TAR 4787052.993
Responded to update from over the weekend that the previous dbVISTA ERROR 10 TAR action plan was not expected to actually eradicate the errors (and that I understood that) and that they have forwarded my questions about procinfo and the timing issues to development

October 21, 2005

21. Oktober 2005 -- Freitag

Calendar:

  • TAR 4787052.993 -- Install ~oracle/bin/csmon.sh script which requires iostat and procinfo. Found iostat in the sysstat package, but had to update the TAR asking where to get the procinfo program for RHEL3. Also asked why the default timeouts are such that the standard unidbbackup, which they want run on a nightly basis, is such that these dbVISTA ERROR 10's (lock timeouts) are caused by it.
  • TAR 4776115.993 -- Answered questions about reproducability of the issue. We haven't seen it again. However, we continue to see problems with queue entries getting "USER UNKNOWN" errors from sendmail and being put back in the retry queue until the users call and get us to remove the queue entry.

Account Scripts:

  • Updated accountname.pl and removeuser for the new DFS fileserver which is now housing home directories for various accounts

October 19, 2005

19. Oktober 2005 -- Mittwoch

WebCT:

  • Finally got SSL working with LDAP authentication. Had to add the UVM ROOT CA certificate into the cacerts file.
  • Justin aimed it squarely at ldap.uvm.edu instead of at the test machine and it totally surprised me by working.

LDAP:

  • Had two "We're modifying the wrong entry" failures in the nightly update which was caused when Banner provided the SSN's of two students and the code was able to realize that there was already an HRS entry with the same SSN that Banner was trying to add to a different entry.
    • Merged the four entries into two active and two expired and then purged the two unused accounts.

Hardware:

  • Auditing list of SUN Hardware that is on maintenance.

October 18, 2005

18. Oktober 2005 -- Dienstag

eMail:

  • Multiple footprints issues about hung imaps from yesterday. Seems the overload on smtp caused by the massive email surge (was it spam) at about 11:30AM caused several clients to get multiple connections locked up.

WebCT:

  • Download, install, run the configuration checker for WebCT 6.0. The only problem it found was the newer kernel because we are running on RHEL 3 update 6 instead of RHEL 3 update 4 (big deal).

LDAP:

  • Banner delivered a student that had two id's in the past with both ID numbers -- so the nightly update process could not determine which one to reactivate.
    • I reactivated the newest one because that was the one that Banner actually thought it knew about and neither account has ever (and I do mean EVER) been used by the graduate student.
    • I then deleted the older one so the nightly update process would not blow up and refuse to run with a duplicate ID number problem tomorrow morning.

October 17, 2005

17. Oktober 2005 -- Montag

Calendar:

  • Opened TAR 4787052.993 to deal with the DB_VISTA ERROR 10 that happened due to a Windows 9.0.4.2 client having a lock timeout error during the nightly unidbbackup run last Wednesday night. Severity 3.

  • Opened TAR 4787124.993 to deal with Ed's ERROR 1714 unable to install 10.1.1.0.2 Windows client because the older client is unable to be uninstalled. (I would have recommended that Ed reload the system since he did some manual removal before he reported the problem -- except I've heard of at least one other person having a similar issue -- where the older version would not delete, and was still there -- because they didn't go manually deleting files and registry entries).

    • Oracle's recommendation to reinstall the 9.0.4.2 client showed the directory where the existing installation still thought the old installer should be. Ed was able to delete that directory and then the installation worked.

WebCT:

  • Set up trace to see why WebCT 6.0 is UNABLE to authenticate with LDAP.

October 15, 2005

15. Oktober 2005 -- Samstag

LDAP: Student Employee error handled

Accounts: Removed old accountname from the error yesterday out of the acctfile it would have been removed from if the code had not broken.

Calendar: unidbfix.

October 14, 2005

14. Oktober 2005 -- Freitag

LDAP: "Student Employee" error cleared up

Calendar: Schedule database repair for Saturday morning (Grr)

Accounts: Examine why account rename failed -- the code sent a request to modify the LDAP entry -- but used one of the email addresses as an attribute name -- that's not going to work.... :-/

Oracle App: poked a hole in the firewall on the test machine for a new app server.

WebCT: Found out that the 6.0 code is completely stupid when it comes to reading SSL certificate files. I generally create the files with both the -text and -out flags so there is the human readable comment at the top of the file. Well, the JAVA SSL code in WebCT 6.0 cannot grok that file. It has to be just the PEM certificate and nothing else in the file. dum-da-dum-dum...

October 12, 2005

12. Oktober 2005 -- Mittwoch

WebCT: set up kerberos config file for WebCT (uses MIT Kerberos servers). Download and skim the WebCT 6.0 Admin Guide... The new Kerberos implementation bypasses the krb5.conf file... YUCK! Reading more... Going with LDAP.

Now SSL has caused the server to crap out... opening another issue with WebCT support... oi!

Calendar: Another repeat mail messages sent because an email address was bad (11@ is not a valid address). Purge the queued message -- it's been delivered multiple times now. Open TAR 4776115.993 with Oracle to see why this might be happening... Oi!

Oracle's first response... you screwed up... Yeah, We sure did... we installed an Oracle product.

October 11, 2005

11. Oktober 2005 -- Dienstag

Legato: Do some research for Geoff about what is wrong with Legato and why it keeps yelling that `SYSTEM' on `(client)' must have remote access privilege to client `(former client machine which is now gone)'.

Found one possible hit but it was specific to a misconfigured Linux machine. Geoff needs to push on John to work this with Legato.

Power: Installed two new 208V/30A PDU's

Identity Management: Reviewing trip notes, grouper and sygnet projects as well as other EduCause objectclass definitions.

October 10, 2005

10. Oktober 2005 -- Montag

LDAP: Cleared up two "Student Employee" errors in HRS feed from over the weekend.

RSA: Opened C0529431 to get Authentication Agent for PAM v5.3.4 working on RHEL3. The solution was to create a sdopts.rec file in /var/ace with the real client address specified (CLIENT_IP=132.198.xxx.xxx) in it. Apparently, the Linux version of this agent incorrectly grabs the loopback address when it is first connecting trying to exchange the node secret file. Sounds like 5.3.5 needs to be written... Hmm, perhaps something as stupid as making sure the address you pick is NOT 127.0.0.1... but what do I know, I'm not a programer ;-)

WebCT: Worked with Justin and MJG facilitating the installation of WebCT 6.0 -- ran into two snags:


  1. X11 on OS/X 10.4 did not work -- the java/python (jython) install code didn't work with it running as the server. The fix for this one was to use Xwin32 on a Windows laptop (We didn't X11 on a linux machine).
  2. The installer code was looking for Linux, Solaris, or Windows as the OS of the database server. First off... Oracle is Oracle, so we have no idea why that query was really being done (perhaps to eventually determine if the DB server was running Oracle or Microsoft SQL). The fix was to modify the python script that was doing the query to just lie and come back that the OS was LINUX. Still have an open issue with WebCT support about this -- We haven't told them we fixed it.

October 6, 2005

6. Oktober 2005 -- Donnerstag

SecurID: Finish config of replica server...

LDAP: Banner merged two id's and says the one that is being used is the one to delete... GRRR

Mail: someone important left... "take care" of their account...

WebCT: set up meeting... exchange email... SecurID config...

October 5, 2005

5. Oktober 2005 -- Mittwoch

WebCT(6):


  • Installed/configured ntp
  • Discovered that legato is better able to back up a system when the client code is actually running. Seems simply installing it isn't enough (Duh!)
  • Disk connected to DB server, and filesystem created for db.

Nexsan: kick off surface scan to resolve the bad data block problem from the rebuild. Disk8 is throwing a ton of errors. Sent off a new log/config dump to Nexsan and they agree disk8 is bad and are sending a replacement.

mouse: replace failed disk drive... inquire for options to purchase more cold spares from TriniComp and DataTrend. Ordered two (2) from DataTrend.

SecurID: prepare to install replica server...


  • replica server configured and installed...
  • will make it production (aka pass out the new sdconf.rec file) tomorrow

October 4, 2005

4. Oktober 2005 -- Dienstag

WebCT: new system built for WebCT 6

Nexsan: gathering more information to clean up the rest of the errors.

Created a couple of blogs...

October 3, 2005

3. Oktober 2005 -- Montag

Nexsan: Upgrade firmware/fail drive -- it's been hanging every few hours (four times within 20 hours). Monitor it going forward...

power: Request power install in machine room to handle new PDU's.