查看: 21|回复: 2

求助,训练模型启动后报错

[复制链接]

113

积分

5

帖子

36

符石

筑基丹师

Rank: 2Rank: 2

积分
113
发表于 昨天 20:57 | 显示全部楼层 |阅读模式
运行训练SEAHEAD模型,然后大佬指导的炼丹步骤第一次预训练设置进行了设置,设置完成运行后,模型启动到80%就停了,然后弹出下面提示,请大佬指导,谢谢。


Initializing models:  80%|##################################################4            | 4/5 [05:54<01:28, 88.51s/it]
Error: OOM when allocating tensor with shape[524288,256] and type float on /job:localhost/replica:0/task:0/devICE:GPU:0 by allocator GPU_0_bfc
         [[node src_dst_opt/vs_inter_AB/dense1/weight_0/Assign (defined at D:\Tools\deepfacelab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.




Original stack trace for 'src_dst_opt/vs_inter_AB/dense1/weight_0/Assign':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
    debug=debug)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
    self.on_initialize()
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 341, in on_initialize
    self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options['lr_dropout']=='cpu')
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in initialize_variables
    vs = { v.name : tf.get_variable ( f'vs_{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in <dictcomp>
    vs = { v.name : tf.get_variable ( f'vs_{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1595, in get_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1338, in get_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 593, in get_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 545, in _true_getter
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 963, in _get_single_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 266, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 227, in _variable_v1_call
    shape=shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 205, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2642, in default_variable_creator
    shape=shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 270, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1670, in __init__
    shape=shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1853, in _init_from_args
    validate_shape=validate_shape).op
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\state_ops.py", line 358, in assign
    validate_shape=validate_shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 59, in assign
    use_locking=use_locking, name=name)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
    op_def=op_def)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)


Traceback (most recent call last):
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
    return fn(*args)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
    target_list, run_metadata)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[524288,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node src_dst_opt/vs_inter_AB/dense1/weight_0/Assign}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.




During handling of the above exception, another exception occurred:


Traceback (most recent call last):
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
    debug=debug)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
    self.on_initialize()
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 657, in on_initialize
    model.init_weights()
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\layers\Saveable.py", line 106, in init_weights
    nn.init_weights(self.get_weights())
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\ops\__init__.py", line 48, in init_weights
    nn.tf_sess.run (ops)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
    run_metadata_ptr)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
    run_metadata)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[524288,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node src_dst_opt/vs_inter_AB/dense1/weight_0/Assign (defined at D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.




Original stack trace for 'src_dst_opt/vs_inter_AB/dense1/weight_0/Assign':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
    debug=debug)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
    self.on_initialize()
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 341, in on_initialize
    self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options['lr_dropout']=='cpu')
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in initialize_variables
    vs = { v.name : tf.get_variable ( f'vs_{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in <dictcomp>
    vs = { v.name : tf.get_variable ( f'vs_{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1595, in get_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1338, in get_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 593, in get_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 545, in _true_getter
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 963, in _get_single_variable
    aggregation=aggregation)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 266, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 227, in _variable_v1_call
    shape=shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 205, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2642, in default_variable_creator
    shape=shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 270, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1670, in __init__
    shape=shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1853, in _init_from_args
    validate_shape=validate_shape).op
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\state_ops.py", line 358, in assign
    validate_shape=validate_shape)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 59, in assign
    use_locking=use_locking, name=name)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
    op_def=op_def)
  File "D:\Tools\DeepFaceLab-NV_RTX30_40_50\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Zhatv换脸论坛免责声明
全站默认解压密码:zhatv.cn
【Zhatv】论坛里的文章仅代表作者本人的观点,与本网站立场无关。
所有文章、内容、信息、资料,都不保证其准确性、完整性、有效性、时效性,请依据情况自身做出判断。
因阅读本站内容而被误导等其他因素所造成的损失责任自负,【Zhatv】不承担任何责任。

5347

积分

47

帖子

915

符石

天仙丹师

Rank: 8Rank: 8Rank: 8Rank: 8

积分
5347
发表于 昨天 22:40 | 显示全部楼层
oom爆显存了
回复

使用道具 举报

1万

积分

485

帖子

9707

符石

玉皇大帝

Rank: 16Rank: 16Rank: 16Rank: 16

积分
16308

灌水之王论坛元老咸鱼勋章

发表于 昨天 23:54 | 显示全部楼层
显存不够,降低BS
通用直播丹代练:QQ1453174
回复

使用道具 举报

小黑屋|ZhaTV ( 滇ICP备15003127号-4 ) |网站地图

GMT+8, 2025-8-17 11:11

Powered by Zhatv.cn

© 2022-2023

快速回复 返回顶部 返回列表