MOS for some SR • Diagnostic collections accelerate SR resolution • Cluster-aware ADR log inspection and management • Advanced system and log monitoring • Incident control and notification • Connect to MOS • SMTP, REST APIs
400 identified as critical/failures • Severe problem check daily: 2AM • All known problem check weekly: 3AM Sunday • Auto-generates a collection when problems detected • Everything required to diagnose & resolve • Results delivered to the notification email
TFA • A version downloaded from MOS • A version included in Grid Infrastructure install & patches • GI version is not fully featured • GI and MOS versions can interfere, conflict
in $(find / -name uninstalltfa) do cd $(dirname $d) ./tfactl uninstall # cd .. && rm -fr . done # Insure ALL AHF/TFA processes are stopped/inactive prior to uninstall # PERFORM THIS STEP ON ALL NODES
-showpass -tag autostart_client_oratier1 -readenvconfig COLLECTION_RETENTION = 7 AUTORUN_SCHEDULE = 3 2 * * 1,2,3,4,5,6 ------------------------------------------------------------ ------------------------------------------------------------ ID: orachk.autostart_client ------------------------------------------------------------ AUTORUN_FLAGS = -usediscovery -tag autostart_client -readenvconfig COLLECTION_RETENTION = 14 AUTORUN_SCHEDULE = 3 3 * * 0 ------------------------------------------------------------ Next auto run starts on Feb 03, 2022 02:03:00 ID:orachk.AUTOSTART_CLIENT_ORATIER1 statusahf option in tfactl is deprecated and will be removed in AHF 22.1.0. Please start using ahfctl for statusahf, Example: ahfctl statusahf status vs statusahf
| Repository Parameter | Value | +---------------------------+---------------------------------+ | Old Location | /opt/oracle.ahf/data/repository | | New Location | /some/directory/repository | | Current Maximum Size (MB) | 10240 | | Current Size (MB) | 0 | | Status | OPEN | ‘---------------------------+---------------------------------' # Repository commands are applied only on the local node
tfactl set trimfiles=ON # default tfactl set reposizeMB= # default=10240 tfactl set rtscan=ON # default tfactl set redact=mask # default=none # Disk space monitoring tfactl set diskUsageMon=ON # default=OFF tfactl set diskUsageMonInterval=240 # Depends on activity. default=60 # Log purge tfactl set autopurge=ON # If space is slim. default=OFF tfactl set manageLogsAutoPurge=ON # default=OFF tfactl set manageLogsAutoPurgeInterval=720 # Set to 12 hours. default=60 tfactl set manageLogsAutoPurgePolicyAge=30d # default=30 tfactl set minfileagetopurge=48 # default=12
| awk -F'|' '{print $2, $3}' | sort | grep -i manage Logs older than the time period will be auto purged(days[d] hours[h]) ( manageLogsAutoPurgePolicyAge ) Managelogs Auto Purge ( manageLogsAutoPurge ) OFF [root@node1 ~]# tfactl set manageLogsAutoPurge=ON Successfully set manageLogsAutoPurge=ON .-------------------------------------------------------. | node1 | +-----------------------------------------------+-------+ | Configuration Parameter | Value | +-----------------------------------------------+-------+ | Managelogs Auto Purge ( manageLogsAutoPurge ) | ON | '-----------------------------------------------+-------' [root@node1 ~]# tfactl print config | egrep "^\|.*\|.*\|$" | awk -F'|' '{print $2, $3}' | sort | grep -i manage Logs older than the time period will be auto purged(days[d] hours[h]) ( manageLogsAutoPurgePolicyAge ) Managelogs Auto Purge ( manageLogsAutoPurge ) ON [root@node1 ~]# tfactl set manageLogsAutoPurgePolicyAge=30d Successfully set manageLogsAutoPurgePolicyAge=30d .----------------------------------------------------------------------------------------------------------------. | node1 | +--------------------------------------------------------------------------------------------------------+-------+ | Configuration Parameter | Value | +--------------------------------------------------------------------------------------------------------+-------+ | Logs older than the time period will be auto purged(days[d]|hours[h]) ( manageLogsAutoPurgePolicyAge ) | 30d | '--------------------------------------------------------------------------------------------------------+-------'
Report space variations over time # Reporting tfactl managelogs -show usage # Show all space use in ADR tfactl managelogs -show usage -gi # Show GI space use tfactl managelogs -show usage -database # Show DB space use tfactl managelogs -show usage -saveusage # Save use for variation reports # Report space use variation tfactl managelogs -show variation -since 1d tfactl managelogs -show variation -since 1d -gi tfactl managelogs -show variation -since 1d -database
can take a long time for: • Large directories • Many files • NOTE: Purge operation loops over files • Strategies for first time purge: • Delete in batches by age—365 days, 180 days, 90 days, etc. • Delete database and GI homes separately • Delete for individual SIDs, nodes
deleted if subdirectories under ADR_HOME are not owned by grid/oracle or oinstall/dba • One mis-owned subdirectory • No files under that ADR_HOME will be purged • Even subdirectories with correct ownership! • Depending on version • grid may not be able to delete files in database ADR_HOMEs • oracle may not be able to delete files in GI ADR_HOMEs
• …schema version is mismatched • …library version is mismatched • …schema version is obsolete • …is not registered • …is for an orphaned CRS event or user • …is for an inactive listener
or ORACLE_HOME not present in oratab • Duplicate ORACLE_SIDs are present in oratab • Database unique name is mismatched to its directory • Can occur during cloning operations • ADR_BASE is not set properly • $ORACLE_HOME/log/diag directory is missing • $ORACLE_HOME/log/diag/adrci_dir.mif missing • $ORACLE_HOME/log/diag/adrci_dir.mif doesn’t list ADR_BASE
OS Watcher logs/output tfactl analyze # Options: -search "pattern" # Search in DB and CRS alert logs # Sets the search period to -last 1h # Override with -last xh|xd -verbose timeline file1 file2 # Shows timeline for specified files
Error: pstack command not found in system. If its installed, please set the PATH and try again. yum install -y gdb tfactl> pstack mmon Output from host : vna1 ------------------------------ # pstack output for pid : 15318 #0 0x00007f33bac6928a in semtimedop () from /lib64/libc.so.6 #1 0x0000000011c58285 in sskgpwwait () #2 0x0000000011c543db in skgpwwait () #3 0x000000001144ccba in ksliwat () #4 0x000000001144c06c in kslwaitctx () #5 0x0000000011a6fd40 in ksarcv () #6 0x00000000038174fa in ksbabs () #7 0x0000000003835ab3 in ksbrdp () #8 0x0000000003c19a4d in opirip () #9 0x00000000024c23e5 in opidrv ()
diagcollect -srdc -help <srdc_profile> can be any of the following, DBCORRUPT Required Diagnostic Data Collection for a Generic Database Corruption DBDATAGUARD Required Diagnostic Data Collection for Data Guard issues including Broker Listener_Services SRDC - Data Collection for TNS-12516 / TNS-12518 / TNS-12519 / TNS-12520. Naming_Services SRDC - Data Collection for ORA-12154 / ORA-12514 / ORA-12528. ORA-00020 SRDC for database ORA-00020 Maximum number of processes exceeded ORA-00060 SRDC for ORA-00060. Internal error code. ORA-00494 SRDC for ORA-00494. ORA-00600 SRDC for ORA-00600. Internal error code. ... ora4023 SRDC - ORA-4023 : Checklist of Evidence to Supply ora4063 SRDC - ORA-4063 : Checklist of Evidence to Supply ora445 SRDC - ORA-445 or Unable to Spawn Process: Checklist of Evidence to Supply (Doc ID 2500730.1) xdb600 SRDC - Required Diagnostic Data Collection for XDB ORA-00600 and ORA-07445 zlgeneric SRDC - Zero Data Loss Recovery Appliance (ZDLRA) Data Collection.
<all|local|n1,n2,..> -defips Include in the default collection the IPS Packages for: ASM, CRS and Databases -sr Enter SR number to which the collection will be uploaded -node Specify comma separated list of host names for collection
-from <time> -to <time> -for <time> -last <n><m|h|d> Files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours -since Same as -last. Kept for backward compatibility. -from "Mon/dd/yyyy hh:mm:ss" From <time> or "yyyy-mm-dd hh:mm:ss" or "yyyy-mm-ddThh:mm:ss" or "yyyy-mm-dd" -to "Mon/dd/yyyy hh:mm:ss" To <time> or "yyyy-mm-dd hh:mm:ss" or "yyyy-mm-ddThh:mm:ss" or "yyyy-mm-dd" -for "Mon/dd/yyyy" For <date>. or "yyyy-mm-dd"
<tagname> -z <zipname> -collectalldirs -collectdir <dir1,dir2..> -collectfiles <file1,..,fileN,dir1,..,dirN> [-onlycollectfiles] -nocopy Does not copy back the zip files to initiating node from all nodes -notrim Does not trim the files collected -tag <tagname> The files will be collected into tagname directory inside the repository -z <zipname> The collection zip file will be given this name in the collection repo -collectalldirs Collect all files from a directory marked "Collect All” flag to true -collectdir Specify a comma separated list of directories and the collection will include all files from these irrespective of type and time constraints in addition to the components specified -collectfiles Specify a comma separated list of files/directories and the collection will include the files and directories in addition to the components specified. if -onlycollectfiles is also used, then no other components will be collected.
tfactl set redact=mask tfactl set redact=sanitize tfactl set redact=none sanitize: Replaces sensitive data in collections with random characters mask: Replaces sensitive data in collections with asterisks (*)
files updated in the last 1 hours as well as chmos/osw data from across the cluster and collect at the initiating node Note: This collection could be larger than required but is there as the simplest way to capture diagnostics if an issue has recently occurred. tfactl diagcollect -last 8h Trim and Zip all files updated in the last 8 hours as well as chmos/osw data from across the cluster and collect at the initiating node tfactl diagcollect -database hrdb,fdb -last 1d -z foo Trim and Zip all files from databases hrdb & fdb in the last 1 day and collect at the initiating node tfactl diagcollect -crs -os -node node1,node2 -last 6h Trim and Zip all crs files, o/s logs and chmos/osw data from node1 & node2 updated in the last 6 hours and collect at the initiating node
"Mar/15/2022" -to "Mar/15/2022 21:00:00" Trim and Zip all ASM logs from node1 updated between from and to time and collect at the initiating node tfactl diagcollect -for "Mar/15/2022" Trim and Zip all log files updated on "Mar/15/2022" and collect at the collect at the initiating node tfactl diagcollect -for "Mar/15/2022 21:00:00" Trim and Zip all log files updated from 09:00 on "Mar/15/2022" to 09:00 on “Mar/16/2022"(i.e. 12 hours before and after the time given) and collect at the initiating node tfactl diagcollect -crs -collectdir /tmp_dir1,/tmp_dir2 Trim and Zip all crs files updated in the last 1 hours Also collect all files from /tmp_dir1 and /tmp_dir2 at the initiating node