Skip to content

Quickstart Colab does not succeed on a T4 #49

@gstranger

Description

@gstranger

To Reproduce

Use the currently linked train.ipynb to try training a policy. Running all the cells for setup and execution of the WalkingTask fails with the following error

WARNING:xax.task.base:Could not resolve task path for HumanoidWalkingTask, returning current working directory
WARNING 2025-06-13 22:00:23 [xax.task.base] Could not resolve task path for HumanoidWalkingTask, returning current working directory
INFO:xax.task.mixins.compile:Setting JAX logging level to INFO
  INFO  2025-06-13 22:00:24 [xax.task.mixins.compile] Setting JAX logging level to INFO
INFO:xax.task.mixins.compile:Setting JAX compilation cache directory to /root/.cache/jax/jaxcache
  INFO  2025-06-13 22:00:24 [xax.task.mixins.compile] Setting JAX compilation cache directory to /root/.cache/jax/jaxcache
INFO:xax.task.mixins.compile:Configuring JAX compilation cache parameters
  INFO  2025-06-13 22:00:24 [xax.task.mixins.compile] Configuring JAX compilation cache parameters
INFO:2025-06-13 22:00:25,080:jax._src.xla_bridge:924: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
  INFO  2025-06-13 22:00:25 [jax._src.xla_bridge] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
INFO:2025-06-13 22:00:25,107:jax._src.xla_bridge:924: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
  INFO  2025-06-13 22:00:25 [jax._src.xla_bridge] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
WARNING:xax.task.mixins.artifacts:Could not resolve task path for HumanoidWalkingTask, returning current working directory
WARNING 2025-06-13 22:00:26 [xax.task.mixins.artifacts] Could not resolve task path for HumanoidWalkingTask, returning current working directory
STATUS:xax.task.mixins.artifacts:/content/humanoid_walking_task/run_0
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.artifacts] /content/humanoid_walking_task/run_0
WARNING:xax.task.base:Could not resolve task path for %s, returning current working directory
WARNING 2025-06-13 22:00:26 [xax.task.base] Could not resolve task path for %s, returning current working directory
STATUS:xax.task.mixins.train:/content
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.train] /content
WARNING:py.warnings:/usr/local/lib/python3.11/dist-packages/kscale/conf.py:44: UserWarning: Settings directory does not exist: /root/.kscale. Creating it now.
  warnings.warn(f"Settings directory does not exist: {dir_path}. Creating it now.")

WARNING 2025-06-13 22:00:26 [py.warnings] /usr/local/lib/python3.11/dist-packages/kscale/conf.py:44: UserWarning: Settings directory does not exist: /root/.kscale. Creating it now.
  warnings.warn(f"Settings directory does not exist: {dir_path}. Creating it now.")

STATUS:xax.task.mixins.train:humanoid_walking_task
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.train] humanoid_walking_task
STATUS:xax.task.mixins.train:JAX devices: [CudaDevice(id=0)]
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.train] JAX devices: [CudaDevice(id=0)]
WARNING:xax.task.base:Could not resolve task path for HumanoidWalkingTask, returning current working directory
WARNING 2025-06-13 22:00:26 [xax.task.base] Could not resolve task path for HumanoidWalkingTask, returning current working directory
INFO:httpx:HTTP Request: GET https://api.kscale.dev/robots/urdf/kbot "HTTP/1.1 200 OK"
  INFO  2025-06-13 22:00:27 [httpx] HTTP Request: GET https://api.kscale.dev/robots/urdf/kbot "HTTP/1.1 200 OK"
INFO:kscale.web.clients.robot_class:Downloading URDF file from https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=...
  INFO  2025-06-13 22:00:27 [kscale.web.clients.robot_class] Downloading URDF file from https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=...
INFO:httpx:HTTP Request: GET https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=... "HTTP/1.1 200 OK"
  INFO  2025-06-13 22:00:28 [httpx] HTTP Request: GET https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=... "HTTP/1.1 200 OK"
INFO:kscale.web.clients.robot_class:Checking MD5 hash of downloaded file
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Checking MD5 hash of downloaded file
INFO:kscale.web.clients.robot_class:Updating downloaded file information
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Updating downloaded file information
INFO:kscale.web.clients.robot_class:Unpacking URDF file
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Unpacking URDF file
INFO:kscale.web.clients.robot_class:Updating downloaded file information
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Updating downloaded file information
INFO:httpx:HTTP Request: GET https://api.kscale.dev/robots/name/kbot "HTTP/1.1 200 OK"
  INFO  2025-06-13 22:00:32 [httpx] HTTP Request: GET https://api.kscale.dev/robots/name/kbot "HTTP/1.1 200 OK"
INFO:xax.task.mixins.train:Starting a new training run
  INFO  2025-06-13 22:00:33 [xax.task.mixins.train] Starting a new training run
PING:ksim.task.rl:Model size: 1,090,861 parameters
  PING  2025-06-13 22:00:36 [ksim.task.rl] Model size: 1,090,861 parameters
PING:ksim.task.rl:Optimizer size: 2,181,722 parameters
  PING  2025-06-13 22:00:36 [ksim.task.rl] Optimizer size: 2,181,722 parameters
INFO:root:Using JAX default device: cuda:0.
  INFO  2025-06-13 22:00:36 [root] Using JAX default device: cuda:0.
INFO:root:MJX Warp is disabled via MJX_WARP_ENABLED=false.
  INFO  2025-06-13 22:00:36 [root] MJX Warp is disabled via MJX_WARP_ENABLED=false.
INFO:root:Using JAX default device: cuda:0.
  INFO  2025-06-13 22:00:40 [root] Using JAX default device: cuda:0.
INFO:root:MJX Warp is disabled via MJX_WARP_ENABLED=false.
  INFO  2025-06-13 22:00:40 [root] MJX Warp is disabled via MJX_WARP_ENABLED=false.

Status
 ✦ JAX devices: [CudaDevice(id=0)]
 ✦ humanoid_walking_task
 ✦ /content
 ✦ /content/humanoid_walking_task/run_0

Pings
 ✦ Optimizer size: 2,181,722 parameters
 ✦ Model size: 1,090,861 parameters
 ✦ Could not resolve task path for HumanoidWalkingTask, returning current working directory
 ✦ /usr/local/lib/python3.11/dist-packages/kscale/conf.py:44: UserWarning: Settings directory does not exist: /root/.kscale. Creating it now.
  warnings.warn(f"Settings directory does not exist: {dir_path}. Creating it now.")

 ✦ Could not resolve task path for %s, returning current working directory
 ✦ Could not resolve task path for HumanoidWalkingTask, returning current working directory
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-10-682409069>](https://omdsnnyzngh-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20250612-060058_RC00_770574881#) in <cell line: 0>()
      1 if __name__ == "__main__":
----> 2     HumanoidWalkingTask.launch(
      3         HumanoidWalkingTaskConfig(
      4             # Training parameters.
      5             num_envs=2048,

11 frames
    [... skipping hidden 7 frame]

    [... skipping hidden 1 frame]

    [... skipping hidden 15 frame]

[/usr/lib/python3.11/dataclasses.py](https://omdsnnyzngh-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20250612-060058_RC00_770574881#) in replace(obj, **changes)
   1501     # changes that aren't fields, this will correctly raise a
   1502     # TypeError.
-> 1503     return obj.__class__(**changes)

TypeError: Data.__init__() got an unexpected keyword argument 'cacc'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions