Oracle数据库内存溢出或作业不可用导致数据库宕机
金蝶云社区-Romantic
Romantic
9人赞赏了该文章 4,213次浏览 未经作者许可,禁止转载编辑于2020年04月28日 09:16:19

相关报错信息如下所示:
查看数据库的警告日志alert*.log信息进行分析:
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /oradata/diag/rdbms/easdb/easdb/trace/easdb_cjq0_13883.trc:
Process W000 died, see its trace file
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /oradata/diag/rdbms/easdb/easdb/trace/easdb_cjq0_13883.trc:
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /oradata/diag/rdbms/easdb/easdb/trace/easdb_cjq0_13883.trc:
Process W000 died, see its trace file
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Mon Feb 24 08:59:57 2020
MMON (ospid: 32361): terminating the instance due to error 472
Mon Feb 24 08:59:58 2020
System state dump requested by (instance=1, osid=32361 (MMON)), summary=[abnormal instance termination].
System State dumped to trace file /oradata/diag/rdbms/easdb/easdb/trace/easdb_diag_32343_20200224085958.trc
Dumping diagnostic data in directory=[cdmp_20200224085958], requested by (instance=1, osid=32361 (MMON)), summary=[abnormal instance termination].
Instance terminated by MMON, pid = 32361
easdb_diag*.trc:
0 remote Oradebug requests
13971236024 session pga memory
18216854712 session pga memory max
0 recursive system API invocations
542 enqueue timeouts
easdb_cjq0*.trc:
*** 2020-01-31 02:00:00.003
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Setting Resource Manager plan SCHEDULER[0x32DD]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2020-01-31 22:00:00.005
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Closing scheduler window
*** 2020-02-01 02:00:00.008
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Setting Resource Manager plan SCHEDULER[0x32DE]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2020-02-24 04:49:02.956
Process J000 is dead (pid=20793 req_ver=137303 cur_ver=137303 state=KSOSP_SPAWNED).
*** 2020-02-24 04:49:04.956
Process J000 is dead (pid=20798 req_ver=137304 cur_ver=137304 state=KSOSP_SPAWNED).
*** 2020-02-24 04:49:06.956
Process J000 is dead (pid=20802 req_ver=137305 cur_ver=137305 state=KSOSP_SPAWNED).
*** 2020-02-24 04:49:08.956
分析解答:
样例中的内存参数值修改仅供参考,要根据现场数据库服务器物理内存大小进行相应的分配调整:
1.数据库参数前期做过如下调整:
SQL>alter system set processes=12000 scope=spfile;
SQL>alter system set sessions=15000 scope=spfile;
SQL>alter system set job_queue_processes=1500 scope=spfile;
SQL>alter system set cursor_sharing=force scope=spfile;
调整后仍有报错。
看报错跟踪日志中可见PGA用到了13G,目前PGA设的初值为2.5G,而服务器总内存为32G,不够用,建议物理内存扩容到64G。
2.加大物理内存到64G后,同时以下参数做调整:
alter system set sga_target=44G scope=spfile sid='*';
alter system set db_cache_size=34G scope=spfile sid='*';
alter system set shared_pool_size=8G scope=spfile sid='*';
alter system set pga_aggregate_target=5G scope=spfile sid='*';
参数调整后,重启数据库参数生效。
针对问题做了数据库参数等调整,调整后效果不佳。

3.关闭这三个数据库自动任务(数据库自动统计分析等在某个业务高峰期触发执行,很耗数据库服务器资源):

AUTO_SPACE_ADVISOR_JOB和auto optimizer stats collection、sql tuning advisor 关闭数据库语句 :
exec dbms_auto_task_admin.disable( client_name=>'auto optimizer stats collection',operation => null,window_name => null);
exec dbms_auto_task_admin.disable( client_name=>'auto space advisor',operation => null,window_name => null);
exec dbms_auto_task_admin.disable( client_name=>'sql tuning advisor',operation => null,window_name => null);
检查作业关闭调度是否成功,查看这个SQL的status字段状态,为disable则没有启用:
select client_name, task_name, operation_name, status from dba_autotask_task;
通过第2点的处理后,问题得到了解决。

4.手动或通过EM或存储过程、shell或批处理脚本执行配置的对数据库用户统计分析。
手动执行统计分析脚本如下所示:
对整个账套做统计分析(在业务不繁忙或晚上进行操作)
begin
dbms_stats.gather_schema_stats(
ownname=> '"EAS75"' , --改为现场EAS账套名称,要为大写名称
cascade=> TRUE,
estimate_percent=> null,
degree=> 2, --根据现场CPU个数情况更改此值,一般CPU个数减2或减掉5个
no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,
granularity=> 'AUTO',
method_opt=> 'FOR ALL COLUMNS SIZE AUTO',
options=> 'GATHER');
end;
/

其中: ownname改为现场的eas账套名,如果在SQL>下执行,则最后要加上“/”,如果在PL/SQL DEVELOPER工具的SQL WINDOW中执行,则不要最后的“/”。

5.Oracle数据库运行时间长了,如3个月以下,会存在数据库僵死进程不释放内存,内存碎片等问题导致内存不够用,建议定期如每三个月做一次数据库服务例行重启,释放内存,对于数据库的性能、防宕机等会起到好的效果。