Skip to content

Convert all of Casso (cores + GUI + CLI) to a purely Unicode application #79

Description

@relmer

Summary

Convert all of CassoCassoCore, CassoEmuCore, the Casso GUI, CassoCli, and UnitTest — to a purely Unicode application: wide (wchar_t/std::wstring) text and filesystem paths end-to-end, with UNICODE/_UNICODE defined for every project. This eliminates the lossy wstring ⇄ narrow/ANSI std::string conversions that are the root cause of a recurring class of encoding bugs.

Scope note: "Casso" here means the entire solution, including the cores and the CLI — not just the GUI shell. Casso is Windows-only (v145 toolset, Win32 / D3D11 / WASAPI), so wchar_t portability is a non-issue; there is no reason to keep the cores on narrow strings.

Motivation

Three separate path-encoding bugs were found and patched individually, all with the same root cause: paths shuttle between std::wstring (Win32 boundary) and ANSI-narrowed std::string, and each lossy hop corrupts any non-ASCII filename (e.g. the ø in Brøderbund):

  1. Drive-widget label tofuDiskManager.cpp widened the native-narrow store path with std::wstring(s.begin(), s.end()), sign-extending byte 0xF8 to U+FFF8 (tofu box) instead of U+00F8 (ø).
  2. Disk MRU not rememberedDiskMru::ToUtf8/FromUtf8 used the platform-narrow path::string()/path(string), writing invalid UTF-8 into recentDisks; the boot picker's fs::exists prune then silently dropped the entry.
  3. disk1Path auto-load mangledDiskSettings::WideToNarrowAscii/NarrowToWideAscii truncated each UTF-16 unit to its low byte, corrupting the persisted last-disk path.

These were each fixed point-wise, but the underlying mixed-string pipeline remains and will keep spawning latent tofu/mojibake bugs — and outright file-open failures for filenames outside the active ANSI code page (e.g. CJK names), which fs::path::string() cannot even represent and ifstream(narrowPath) cannot open.

Scope (all projects)

  • CassoCore — convert path/string handling to std::wstring; any file I/O (e.g. assembler source/listing/output) opens via wide paths.
  • CassoEmuCoreDiskImageStore, DiskImage, WozLoader, ROM/character-ROM loaders, config file I/O: take std::wstring paths; open files via wide streams (ifstream/ofstream accept std::filesystem::path/const wchar_t* on Windows) so non-ANSI filenames open correctly. GetSourcePath/Mount signatures become wide.
  • Casso (GUI) — already mostly wide at the Win32 boundary; remove the remaining narrow hops (CpuManager::PostCommand payload, DiskManager ⇄ store, prefs serialization) and keep wstring throughout. Confirm UNICODE/_UNICODE and wide (-W) Win32 APIs everywhere.
  • CassoCli — becomes a wide console app: wmain, wide argv, wide path handling.
  • UnitTest — update fixtures/assertions to wide; add non-ASCII and non-ANSI (e.g. CJK) filename round-trip coverage.

Boundary surfaces that currently force a narrow hop

  • CpuManager::PostCommand(WORD id, const std::string& payload) — command-queue payload carrying the path.
  • DiskImageStore::Mount(int, int, const string&) / GetSourcePath() — opens via ifstream(path, ios::binary) (DiskImageStore.cpp:185, :259).
  • DiskSettings, DiskMru, GlobalUserPrefs (recentDisks, disk1Path/disk2Path) — path values stored in JSON.

JSON is UTF-8 — not a special case

JSON has a mandated encoding: UTF-8 (RFC 8259 §8.1). Casso's JSON layer already honors this and is byte-transparent — it never does ANSI anything:

  • JsonWriter.cpp:248: "Bytes >= 0x80 are assumed UTF-8" — high bytes pass through untouched; only control chars < 0x20 are escaped as \uXXXX.
  • JsonParser.cpp decodes \u escapes back to UTF-8.

So there is no narrow-ANSI hop in the JSON path and no "wrinkle" to design around. The three bugs above came from feeding ANSI/byte-truncated data into a layer that already expects UTF-8 — not from JSON itself.

For the wide-everywhere target, the clean model is: the JSON parser/writer are the single UTF-8 boundary — the parser decodes UTF-8 → wstring on read and the writer encodes wstring → UTF-8 on write. JsonValue then holds wide strings in memory, so no std::string appears anywhere in app memory; UTF-8 exists only as the on-disk wire form inside the parser/writer. (Casso's JsonValue/JsonParser/JsonWriter is duplicated in TCDir; mirror whatever lands here.)

Proposed approach

  1. Define UNICODE/_UNICODE for all projects; switch all Win32 calls to the wide variants.
  2. Make every path-bearing signature std::wstring (cores included); make all file I/O open via wide paths.
  3. Move the UTF-8 encode/decode inside the JSON parser/writer (the I/O boundary): parser yields wstring, writer accepts wstring, on-disk bytes stay UTF-8. JsonValue holds wide in memory. Delete the ad-hoc per-site wstring⇄narrow converters (DiskSettings, ThemeLoader, etc.).
  4. Ban the bug-prone idioms on path values: wstring(x.begin(), x.end()) (sign-extends) and fs::path::string()/path(string) (ANSI). Add a grep/lint guard.
  5. Add round-trip tests across the full mount → persist → reload → drive-label path with non-ASCII and non-ANSI (CJK) filenames.

Already-landed point fixes (context, not a substitute for this work)

  • DiskSettings.cpp: WideToNarrowAscii/NarrowToWideAscii → UTF-8 WideToUtf8/Utf8ToWide (CP_UTF8).
  • DiskMru.cpp: ToUtf8/FromUtf8u8string/char8_t.
  • DiskManager.cpp: drive-label widen wstring(begin,end)fs::path(src).wstring().

Acceptance criteria

  • Every project builds with UNICODE/_UNICODE; all Win32 calls are wide.
  • No path-bearing API in any project (cores included) takes/returns narrow std::string; no fs::path::string()/path(string) or wstring(begin,end) conversions remain on path values.
  • The JSON parser/writer are the only UTF-8 boundary; JsonValue holds wide strings; on-disk JSON stays valid UTF-8.
  • Mounting, persisting, recalling, and labeling a disk image whose filename contains non-ASCII and non-ANSI (CJK) characters all round-trip, open, and render correctly.
  • CassoCli handles wide argv/paths (wmain).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions