MOS for some SR • Diagnostic collections accelerate SR resolution • Cluster-aware ADR log inspection and management • Advanced system and log monitoring • Incident control and notification • Connect to MOS • SMTP, REST APIs
400 identified as critical/failures • Severe problem check daily: 2AM • All known problem check weekly: 3AM Sunday • Auto-generates a collection when problems detected • Everything required to diagnose & resolve • Results delivered to the notification email
TFA • A version downloaded from MOS • A version included in Grid Infrastructure install & patches • GI version is not fully featured • GI and MOS versions can interfere, conflict
Cluster awareness • Full AHF capabilities • Includes compliance checks • Enables notifications • Automatic diagnostic collection when issues are detected • May conflict with existing AHF/TFA installations
• No automatic or remote diagnostics, collections • Limited file visibility (must be readable by Oracle home owner) • /var/log/messages • Some Grid Infrastructure logs • May co-exist with Daemon installations • No special pre-install considerations
in $(find / -name uninstalltfa) do cd $(dirname $d) ./tfactl uninstall # cd .. && rm -fr . done # Insure ALL AHF/TFA processes are stopped/inactive prior to uninstall # PERFORM THIS STEP ON ALL NODES
Current Node List in TFA : 1. node1 2. node2 Node List in Cluster : 1. node1 2. node2 Node List to sync TFA Certificates : 1 node2 Do you want to update this node list? Y|[N]: Syncing TFA Certificates on node2 : TFA_HOME on node2 : /opt/oracle.ahf/tfa ...
| Repository Parameter | Value | +---------------------------+---------------------------------+ | Old Location | /opt/oracle.ahf/data/repository | | New Location | /some/directory/repository | | Current Maximum Size (MB) | 10240 | | Current Size (MB) | 0 | | Status | OPEN | ‘---------------------------+---------------------------------' # Repository commands are applied only on the local node
tfactl set trimfiles=ON # default tfactl set reposizeMB= # default=10240 tfactl set rtscan=ON # default tfactl set redact=mask # default=none # Disk space monitoring tfactl set diskUsageMon=ON # default=OFF tfactl set diskUsageMonInterval=240 # Depends on activity. default=60 # Log purge tfactl set autopurge=ON # If space is slim. default=OFF tfactl set manageLogsAutoPurge=ON # default=OFF tfactl set manageLogsAutoPurgeInterval=720 # Set to 12 hours. default=60 tfactl set manageLogsAutoPurgePolicyAge=30d # default=30 tfactl set minfileagetopurge=48 # default=12
syntax may not match docs • Run tfactl <command> -h or tfactl <command> help • Some commands are user (root, oracle, grid) specific • Regression (usually minor) • Don’t build complex automation on new features • Don’t (always) rush to upgrade to the latest version • Example: GI can’t always see/manage DB & vice-versa
Some require quotes, some don’t, some work either way • Some take double quotes, others take single quotes • YYYY/MM/DD or YYYY-MM-DD or YYYYMMDD or … • Some take dates and times separately • Sometimes there are -d and -t flags • Some take timestamps • Some work with either, others are specific
[root@node1 ~]# tfactl diagcollect -h Collect logs from across nodes in cluster Usage : /opt/oracle.ahf/tfa/bin/tfactl diagcollect [ [component_name1] [component_name2] ... [component_nameN] | [-srdc <srdc_profile>] | [-defips]] [-sr <SR#>] [-node <all|local|n1,n2,..>] [-tag <tagname>] [-z <filename>] [-acrlevel <system,database,userdata>] [-last <n><m|h|d>| -from <time> -to <time> | -for <time>] [-nocopy] [-notrim] [-silent] [-cores][-collectalldirs][-collectdir <dir1,dir2..>][-collectfiles <file1,..,fileN,dir1,..,dirN> [-onlycollectfiles]][- examples] components:-ips|-database|-asm|-crsclient|-dbclient|-dbwlm|-tns|-rhp|-procinfo|-cvu|-afd|-crs|-cha|-wls|-emagenti|-emagent|-oms|-omsi|-ocm|-emplugins|-em|- acfs|-install|-cfgtools|-os|-ashhtml|-ashtext|-awrhtml|-awrtext|-sosreport|-qos|-ahf|-dataguard -srdc Service Request Data Collection (SRDC). -database Specify comma separated list of db unique names for collection -defips Include in the default collection the IPS Packages for: ASM, CRS and Databases -sr Enter SR number to which the collection will be uploaded -node Specify comma separated list of host names for collection -tag <tagname> The files will be collected into tagname directory inside repository -z <zipname> The collection zip file will be given this name within the TFA collection repository -last <n><m|h|d> Files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours -since Same as -last. Kept for backward compatibility. -from "Mon/dd/yyyy hh:mm:ss" From <time> or "yyyy-mm-dd hh:mm:ss" or "yyyy-mm-ddThh:mm:ss" or “yyyy-mm-dd" ...
[root@node1 ~]# tfactl diagcollect -examples Examples: /opt/oracle.ahf/tfa/bin/tfactl diagcollect Trim and Zip all files updated in the last 1 hours as well as chmos/osw data from across the cluster and collect at the initiating node Note: This collection could be larger than required but is there as the simplest way to capture diagnostics if an issue has recently occurred. /opt/oracle.ahf/tfa/bin/tfactl diagcollect -last 8h Trim and Zip all files updated in the last 8 hours as well as chmos/osw data from across the cluster and collect at the initiating node /opt/oracle.ahf/tfa/bin/tfactl diagcollect -database hrdb,fdb -last 1d -z foo Trim and Zip all files from databases hrdb & fdb in the last 1 day and collect at the initiating node ...
[oracle@node1 ~]$ tfactl analyze -examples Examples: /opt/oracle.ahf/tfa/bin/tfactl analyze -since 5h Show summary of events from alert logs, system messages in last 5 hours. /opt/oracle.ahf/tfa/bin/tfactl analyze -comp os -since 1d Show summary of events from system messages in last 1 day. /opt/oracle.ahf/tfa/bin/tfactl analyze -search "ORA-" -since 2d Search string ORA- in alert and system logs in past 2 days. /opt/oracle.ahf/tfa/bin/tfactl analyze -search "/Starting/c" -since 2d Search case sensitive string "Starting" in past 2 days. /opt/oracle.ahf/tfa/bin/tfactl analyze -comp osw -since 6h Show OSWatcher Top summary in last 6 hours. ...
Watcher logs/output tfactl analyze # Options: -search "pattern" # Search in DB and CRS alert logs # Sets the search period to -last 1h # Override with -last xh|xd -verbose timeline file1 file2 # Shows timeline for specified files analyze (Only runs as root in 21.4)
sga NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ allow_group_access_to_sga boolean FALSE lock_sga boolean FALSE pre_page_sga boolean TRUE sga_max_size big integer 1536M sga_min_size big integer 0 sga_target big integer 1536M unified_audit_sga_queue_size integer 1048576 param (Broken in 21.4)
Error: pstack command not found in system. If its installed, please set the PATH and try again. yum install -y gdb tfactl> pstack mmon Output from host : vna1 ------------------------------ # pstack output for pid : 15318 #0 0x00007f33bac6928a in semtimedop () from /lib64/libc.so.6 #1 0x0000000011c58285 in sskgpwwait () #2 0x0000000011c543db in skgpwwait () #3 0x000000001144ccba in ksliwat () #4 0x000000001144c06c in kslwaitctx () #5 0x0000000011a6fd40 in ksarcv () #6 0x00000000038174fa in ksbabs () #7 0x0000000003835ab3 in ksbrdp () #8 0x0000000003c19a4d in opirip () #9 0x00000000024c23e5 in opidrv ()
diagcollect -srdc -help <srdc_profile> can be any of the following, DBCORRUPT Required Diagnostic Data Collection for a Generic Database Corruption DBDATAGUARD Required Diagnostic Data Collection for Data Guard issues including Broker Listener_Services SRDC - Data Collection for TNS-12516 / TNS-12518 / TNS-12519 / TNS-12520. Naming_Services SRDC - Data Collection for ORA-12154 / ORA-12514 / ORA-12528. ORA-00020 SRDC for database ORA-00020 Maximum number of processes exceeded ORA-00060 SRDC for ORA-00060. Internal error code. ORA-00494 SRDC for ORA-00494. ORA-00600 SRDC for ORA-00600. Internal error code. ... ora4023 SRDC - ORA-4023 : Checklist of Evidence to Supply ora4063 SRDC - ORA-4063 : Checklist of Evidence to Supply ora445 SRDC - ORA-445 or Unable to Spawn Process: Checklist of Evidence to Supply (Doc ID 2500730.1) xdb600 SRDC - Required Diagnostic Data Collection for XDB ORA-00600 and ORA-07445 zlgeneric SRDC - Zero Data Loss Recovery Appliance (ZDLRA) Data Collection.
<all|local|n1,n2,..> -defips Include in the default collection the IPS Packages for: ASM, CRS and Databases -sr Enter SR number to which the collection will be uploaded -node Specify comma separated list of host names for collection
-from <time> -to <time> -for <time> -last <n><m|h|d> Files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours -since Same as -last. Kept for backward compatibility. -from "Mon/dd/yyyy hh:mm:ss" From <time> or "yyyy-mm-dd hh:mm:ss" or "yyyy-mm-ddThh:mm:ss" or "yyyy-mm-dd" -to "Mon/dd/yyyy hh:mm:ss" To <time> or "yyyy-mm-dd hh:mm:ss" or "yyyy-mm-ddThh:mm:ss" or "yyyy-mm-dd" -for "Mon/dd/yyyy" For <date>. or "yyyy-mm-dd"
<tagname> -z <zipname> -collectalldirs -collectdir <dir1,dir2..> -collectfiles <file1,..,fileN,dir1,..,dirN> [-onlycollectfiles] -nocopy Does not copy back the zip files to initiating node from all nodes -notrim Does not trim the files collected -tag <tagname> The files will be collected into tagname directory inside the repository -z <zipname> The collection zip file will be given this name in the collection repo -collectalldirs Collect all files from a directory marked "Collect All” flag to true -collectdir Specify a comma separated list of directories and the collection will include all files from these irrespective of type and time constraints in addition to the components specified -collectfiles Specify a comma separated list of files/directories and the collection will include the files and directories in addition to the components specified. if -onlycollectfiles is also used, then no other components will be collected.
tfactl set redact=mask tfactl set redact=sanitize tfactl set redact=none sanitize: Replaces sensitive data in collections with random characters mask: Replaces sensitive data in collections with asterisks (*)
files updated in the last 1 hours as well as chmos/osw data from across the cluster and collect at the initiating node Note: This collection could be larger than required but is there as the simplest way to capture diagnostics if an issue has recently occurred. tfactl diagcollect -last 8h Trim and Zip all files updated in the last 8 hours as well as chmos/osw data from across the cluster and collect at the initiating node tfactl diagcollect -database hrdb,fdb -last 1d -z foo Trim and Zip all files from databases hrdb & fdb in the last 1 day and collect at the initiating node tfactl diagcollect -crs -os -node node1,node2 -last 6h Trim and Zip all crs files, o/s logs and chmos/osw data from node1 & node2 updated in the last 6 hours and collect at the initiating node
"Mar/15/2022" -to "Mar/15/2022 21:00:00" Trim and Zip all ASM logs from node1 updated between from and to time and collect at the initiating node tfactl diagcollect -for "Mar/15/2022" Trim and Zip all log files updated on "Mar/15/2022" and collect at the collect at the initiating node tfactl diagcollect -for "Mar/15/2022 21:00:00" Trim and Zip all log files updated from 09:00 on "Mar/15/2022" to 09:00 on “Mar/16/2022"(i.e. 12 hours before and after the time given) and collect at the initiating node tfactl diagcollect -crs -collectdir /tmp_dir1,/tmp_dir2 Trim and Zip all crs files updated in the last 1 hours Also collect all files from /tmp_dir1 and /tmp_dir2 at the initiating node
Report space variations over time # Reporting tfactl managelogs -show usage # Show all space use in ADR tfactl managelogs -show usage -gi # Show GI space use tfactl managelogs -show usage -database # Show DB space use tfactl managelogs -show usage -saveusage # Save use for variation reports # Report space use variation tfactl managelogs -show variation -since 1d tfactl managelogs -show variation -since 1d -gi tfactl managelogs -show variation -since 1d -database
can take a long time for: • Large directories • Many files • NOTE: Purge operation loops over files • Strategies for first time purge: • Delete in batches by age—365 days, 180 days, 90 days, etc. • Delete database and GI homes separately • Delete for individual SIDs, nodes
deleted if subdirectories under ADR_HOME are not owned by grid/oracle or oinstall/dba • One mis-owned subdirectory • No files under that ADR_HOME will be purged • Even subdirectories with correct ownership! • Depending on version • grid may not be able to delete files in database ADR_HOMEs • oracle may not be able to delete files in GI ADR_HOMEs
• …schema version is mismatched • …library version is mismatched • …schema version is obsolete • …is not registered • …is for an orphaned CRS event or user • …is for an inactive listener
or ORACLE_HOME not present in oratab • Duplicate ORACLE_SIDs are present in oratab • Database unique name is mismatched to its directory • Can occur during cloning operations • ADR_BASE is not set properly • $ORACLE_HOME/log/diag directory is missing • $ORACLE_HOME/log/diag/adrci_dir.mif missing • $ORACLE_HOME/log/diag/adrci_dir.mif doesn’t list ADR_BASE