Skip to content

Commit fdac99d

Browse files
committed
fix(vlm): oom on default gpu_memory_utilization
Lower the tp=4 default from 0.95 to 0.9 to leave headroom for the vision encoder. Signed-off-by: chenht2022 <chenht2022@gmail.com>
1 parent d053d50 commit fdac99d

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

python/tokenspeed/runtime/utils/server_args.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,7 @@ def resolve_memory_and_scheduling(self):
361361
elif self.mapping.world_size >= 8:
362362
self.gpu_memory_utilization = 0.81
363363
elif self.mapping.world_size >= 4:
364-
self.gpu_memory_utilization = 0.95
364+
self.gpu_memory_utilization = 0.9
365365
elif self.mapping.world_size >= 2:
366366
self.gpu_memory_utilization = 0.87
367367
else:

0 commit comments

Comments
 (0)