Summary
Convert all of Casso — CassoCore, CassoEmuCore, the Casso GUI, CassoCli, and UnitTest — to a purely Unicode application: wide (wchar_t/std::wstring) text and filesystem paths end-to-end, with UNICODE/_UNICODE defined for every project. This eliminates the lossy wstring ⇄ narrow/ANSI std::string conversions that are the root cause of a recurring class of encoding bugs.
Scope note: "Casso" here means the entire solution, including the cores and the CLI — not just the GUI shell. Casso is Windows-only (v145 toolset, Win32 / D3D11 / WASAPI), so wchar_t portability is a non-issue; there is no reason to keep the cores on narrow strings.
Motivation
Three separate path-encoding bugs were found and patched individually, all with the same root cause: paths shuttle between std::wstring (Win32 boundary) and ANSI-narrowed std::string, and each lossy hop corrupts any non-ASCII filename (e.g. the ø in Brøderbund):
- Drive-widget label tofu —
DiskManager.cpp widened the native-narrow store path with std::wstring(s.begin(), s.end()), sign-extending byte 0xF8 to U+FFF8 (tofu box) instead of U+00F8 (ø).
- Disk MRU not remembered —
DiskMru::ToUtf8/FromUtf8 used the platform-narrow path::string()/path(string), writing invalid UTF-8 into recentDisks; the boot picker's fs::exists prune then silently dropped the entry.
disk1Path auto-load mangled — DiskSettings::WideToNarrowAscii/NarrowToWideAscii truncated each UTF-16 unit to its low byte, corrupting the persisted last-disk path.
These were each fixed point-wise, but the underlying mixed-string pipeline remains and will keep spawning latent tofu/mojibake bugs — and outright file-open failures for filenames outside the active ANSI code page (e.g. CJK names), which fs::path::string() cannot even represent and ifstream(narrowPath) cannot open.
Scope (all projects)
- CassoCore — convert path/string handling to
std::wstring; any file I/O (e.g. assembler source/listing/output) opens via wide paths.
- CassoEmuCore —
DiskImageStore, DiskImage, WozLoader, ROM/character-ROM loaders, config file I/O: take std::wstring paths; open files via wide streams (ifstream/ofstream accept std::filesystem::path/const wchar_t* on Windows) so non-ANSI filenames open correctly. GetSourcePath/Mount signatures become wide.
- Casso (GUI) — already mostly wide at the Win32 boundary; remove the remaining narrow hops (
CpuManager::PostCommand payload, DiskManager ⇄ store, prefs serialization) and keep wstring throughout. Confirm UNICODE/_UNICODE and wide (-W) Win32 APIs everywhere.
- CassoCli — becomes a wide console app:
wmain, wide argv, wide path handling.
- UnitTest — update fixtures/assertions to wide; add non-ASCII and non-ANSI (e.g. CJK) filename round-trip coverage.
Boundary surfaces that currently force a narrow hop
CpuManager::PostCommand(WORD id, const std::string& payload) — command-queue payload carrying the path.
DiskImageStore::Mount(int, int, const string&) / GetSourcePath() — opens via ifstream(path, ios::binary) (DiskImageStore.cpp:185, :259).
DiskSettings, DiskMru, GlobalUserPrefs (recentDisks, disk1Path/disk2Path) — path values stored in JSON.
JSON is UTF-8 — not a special case
JSON has a mandated encoding: UTF-8 (RFC 8259 §8.1). Casso's JSON layer already honors this and is byte-transparent — it never does ANSI anything:
JsonWriter.cpp:248: "Bytes >= 0x80 are assumed UTF-8" — high bytes pass through untouched; only control chars < 0x20 are escaped as \uXXXX.
JsonParser.cpp decodes \u escapes back to UTF-8.
So there is no narrow-ANSI hop in the JSON path and no "wrinkle" to design around. The three bugs above came from feeding ANSI/byte-truncated data into a layer that already expects UTF-8 — not from JSON itself.
For the wide-everywhere target, the clean model is: the JSON parser/writer are the single UTF-8 boundary — the parser decodes UTF-8 → wstring on read and the writer encodes wstring → UTF-8 on write. JsonValue then holds wide strings in memory, so no std::string appears anywhere in app memory; UTF-8 exists only as the on-disk wire form inside the parser/writer. (Casso's JsonValue/JsonParser/JsonWriter is duplicated in TCDir; mirror whatever lands here.)
Proposed approach
- Define
UNICODE/_UNICODE for all projects; switch all Win32 calls to the wide variants.
- Make every path-bearing signature
std::wstring (cores included); make all file I/O open via wide paths.
- Move the UTF-8 encode/decode inside the JSON parser/writer (the I/O boundary): parser yields
wstring, writer accepts wstring, on-disk bytes stay UTF-8. JsonValue holds wide in memory. Delete the ad-hoc per-site wstring⇄narrow converters (DiskSettings, ThemeLoader, etc.).
- Ban the bug-prone idioms on path values:
wstring(x.begin(), x.end()) (sign-extends) and fs::path::string()/path(string) (ANSI). Add a grep/lint guard.
- Add round-trip tests across the full mount → persist → reload → drive-label path with non-ASCII and non-ANSI (CJK) filenames.
Already-landed point fixes (context, not a substitute for this work)
DiskSettings.cpp: WideToNarrowAscii/NarrowToWideAscii → UTF-8 WideToUtf8/Utf8ToWide (CP_UTF8).
DiskMru.cpp: ToUtf8/FromUtf8 → u8string/char8_t.
DiskManager.cpp: drive-label widen wstring(begin,end) → fs::path(src).wstring().
Acceptance criteria
- Every project builds with
UNICODE/_UNICODE; all Win32 calls are wide.
- No path-bearing API in any project (cores included) takes/returns narrow
std::string; no fs::path::string()/path(string) or wstring(begin,end) conversions remain on path values.
- The JSON parser/writer are the only UTF-8 boundary;
JsonValue holds wide strings; on-disk JSON stays valid UTF-8.
- Mounting, persisting, recalling, and labeling a disk image whose filename contains non-ASCII and non-ANSI (CJK) characters all round-trip, open, and render correctly.
CassoCli handles wide argv/paths (wmain).
Summary
Convert all of Casso —
CassoCore,CassoEmuCore, theCassoGUI,CassoCli, andUnitTest— to a purely Unicode application: wide (wchar_t/std::wstring) text and filesystem paths end-to-end, withUNICODE/_UNICODEdefined for every project. This eliminates the lossywstring⇄ narrow/ANSIstd::stringconversions that are the root cause of a recurring class of encoding bugs.Motivation
Three separate path-encoding bugs were found and patched individually, all with the same root cause: paths shuttle between
std::wstring(Win32 boundary) and ANSI-narrowedstd::string, and each lossy hop corrupts any non-ASCII filename (e.g. theøinBrøderbund):DiskManager.cppwidened the native-narrow store path withstd::wstring(s.begin(), s.end()), sign-extending byte0xF8toU+FFF8(tofu box) instead ofU+00F8(ø).DiskMru::ToUtf8/FromUtf8used the platform-narrowpath::string()/path(string), writing invalid UTF-8 intorecentDisks; the boot picker'sfs::existsprune then silently dropped the entry.disk1Pathauto-load mangled —DiskSettings::WideToNarrowAscii/NarrowToWideAsciitruncated each UTF-16 unit to its low byte, corrupting the persisted last-disk path.These were each fixed point-wise, but the underlying mixed-string pipeline remains and will keep spawning latent tofu/mojibake bugs — and outright file-open failures for filenames outside the active ANSI code page (e.g. CJK names), which
fs::path::string()cannot even represent andifstream(narrowPath)cannot open.Scope (all projects)
std::wstring; any file I/O (e.g. assembler source/listing/output) opens via wide paths.DiskImageStore,DiskImage,WozLoader, ROM/character-ROM loaders, config file I/O: takestd::wstringpaths; open files via wide streams (ifstream/ofstreamacceptstd::filesystem::path/const wchar_t*on Windows) so non-ANSI filenames open correctly.GetSourcePath/Mountsignatures become wide.CpuManager::PostCommandpayload,DiskManager⇄ store, prefs serialization) and keepwstringthroughout. ConfirmUNICODE/_UNICODEand wide (-W) Win32 APIs everywhere.wmain, wide argv, wide path handling.Boundary surfaces that currently force a narrow hop
CpuManager::PostCommand(WORD id, const std::string& payload)— command-queue payload carrying the path.DiskImageStore::Mount(int, int, const string&)/GetSourcePath()— opens viaifstream(path, ios::binary)(DiskImageStore.cpp:185,:259).DiskSettings,DiskMru,GlobalUserPrefs(recentDisks,disk1Path/disk2Path) — path values stored in JSON.JSON is UTF-8 — not a special case
JSON has a mandated encoding: UTF-8 (RFC 8259 §8.1). Casso's JSON layer already honors this and is byte-transparent — it never does ANSI anything:
JsonWriter.cpp:248: "Bytes >= 0x80 are assumed UTF-8" — high bytes pass through untouched; only control chars< 0x20are escaped as\uXXXX.JsonParser.cppdecodes\uescapes back to UTF-8.So there is no narrow-ANSI hop in the JSON path and no "wrinkle" to design around. The three bugs above came from feeding ANSI/byte-truncated data into a layer that already expects UTF-8 — not from JSON itself.
For the wide-everywhere target, the clean model is: the JSON parser/writer are the single UTF-8 boundary — the parser decodes UTF-8 →
wstringon read and the writer encodeswstring→ UTF-8 on write.JsonValuethen holds wide strings in memory, so nostd::stringappears anywhere in app memory; UTF-8 exists only as the on-disk wire form inside the parser/writer. (Casso'sJsonValue/JsonParser/JsonWriteris duplicated in TCDir; mirror whatever lands here.)Proposed approach
UNICODE/_UNICODEfor all projects; switch all Win32 calls to the wide variants.std::wstring(cores included); make all file I/O open via wide paths.wstring, writer acceptswstring, on-disk bytes stay UTF-8.JsonValueholds wide in memory. Delete the ad-hoc per-sitewstring⇄narrow converters (DiskSettings,ThemeLoader, etc.).wstring(x.begin(), x.end())(sign-extends) andfs::path::string()/path(string)(ANSI). Add a grep/lint guard.Already-landed point fixes (context, not a substitute for this work)
DiskSettings.cpp:WideToNarrowAscii/NarrowToWideAscii→ UTF-8WideToUtf8/Utf8ToWide(CP_UTF8).DiskMru.cpp:ToUtf8/FromUtf8→u8string/char8_t.DiskManager.cpp: drive-label widenwstring(begin,end)→fs::path(src).wstring().Acceptance criteria
UNICODE/_UNICODE; all Win32 calls are wide.std::string; nofs::path::string()/path(string)orwstring(begin,end)conversions remain on path values.JsonValueholds wide strings; on-disk JSON stays valid UTF-8.CassoClihandles wide argv/paths (wmain).