Skip to content

Commit 9980dc4

Browse files
hilgertmdcherednik
authored andcommitted
at3: write fact chunk, fix bytes_per_frame and chunk sizes
The AT3-in-WAV writer produces headers that Sony's psp_at3tool rejects for files longer than around forty seconds. The tool prints "input file is illegal file or over 2G Byte" and refuses to decode. ffmpeg accepts the file but decodes it without any encoder-delay compensation, leaving a variable lag of several hundred samples relative to the source. The two observations have a common root cause: the header we write is missing fields that downstream decoders rely on. This patch addresses three concrete issues in src/at3.cpp. First, the writer emits no fact chunk. The fact chunk is optional in the general RIFF specification but is how WAVEFORMATEX based codecs announce the total number of decoded samples per channel. psp_at3tool uses the sample count together with samples-per-frame to decide how much PCM to produce and where to stop. Without a fact chunk the tool falls back to a short default and either truncates output or, for longer streams, rejects the file outright. ffmpeg uses the same field to skip encoder priming samples. Sony's own AT3 files carry this chunk with a fixed eight byte payload containing total_samples and samples_per_frame. We now write the same structure. Second, the bytes_per_frame field in the ATRAC3 extradata was hardcoded to 0x10 with an XXX comment. The correct value for standard ATRAC3 is 0x1000, that is 4096, which corresponds to the PCM bytes represented by one frame (1024 samples per channel times two channels times two bytes per sample). Sony's encoder writes 4096 at this offset and both ffmpeg and psp_at3tool validate against that number. The previous value of sixteen bytes per frame is nonsensical and was part of why psp_at3tool misestimated the playback length. Third, the RIFF chunk_size field was being written as the full file size. By the RIFF specification this field should hold the size of everything that follows the field itself, that is file_size minus eight. Writing the full size is tolerated by ffmpeg but violates the specification and makes the file look larger than it is to strict parsers. Because the PCM engine can flush additional frames after the initially estimated numFrames count (due to look-ahead tail during encoding), the three length fields chunk_size, total_samples, and subchunk2_size were stale by one to three frames relative to the actual data on disk. To keep them consistent, TAt3 now counts frames as WriteFrame is called and seeks back to overwrite the three length fields in the destructor, so the final file describes its real contents. The patch is purely a container metadata fix. The encoded AT3 payload is byte-identical to before. After this change, output from atracdenc for long test tracks (90 and 186 seconds, 132 kbps LP2) is accepted and fully decoded by psp_at3tool in a single pass, and ffmpeg decodes with a constant small codec latency instead of the previous variable drift. This made it possible to run a proper triple comparison against Sony's reference encoder, which previously looked catastrophic (gap around -22 dB SNR) purely due to the alignment problem but sits at roughly -0.5 to -1.4 dB SNR once the container headers are correct. Signed-off-by: hilman2 <hilman2@gmail.com>
1 parent a958b27 commit 9980dc4

1 file changed

Lines changed: 50 additions & 5 deletions

File tree

src/at3.cpp

Lines changed: 50 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,12 +64,20 @@ At3WaveHeader {
6464

6565
// atrac3 extradata
6666
uint16_t unknown0; // always 1
67-
uint32_t bytes_per_frame; // samples per channel (ffmpeg) or bytes per frame (libnetmd)
67+
uint32_t bytes_per_frame; // PCM bytes represented per frame = 1024 samples * 2ch * 2B = 0x1000
6868
uint16_t coding_mode; // 1 = joint stereo, 0 = stereo
6969
uint16_t coding_mode2; // same as <coding_mode>
7070
uint16_t unknown1; // always 1
7171
uint16_t unknown2; // always 0
7272

73+
// "fact" subchunk — required by Sony's psp_at3tool decoder and by ffmpeg
74+
// for encoder-delay compensation. Without it, PSP tool rejects files
75+
// > ~40 s with "input file is illegal file or over 2G Byte".
76+
char fact_id[4];
77+
uint32_t fact_size; // 8
78+
uint32_t total_samples; // total PCM samples per channel
79+
uint32_t samples_per_frame; // 1024 for ATRAC3
80+
7381
// "data" subchunk
7482
char subchunk2_id[4];
7583
uint32_t subchunk2_size;
@@ -83,6 +91,8 @@ class TAt3 : public ICompressedOutput {
8391
TAt3(const std::string &filename, size_t numChannels,
8492
uint32_t numFrames, uint32_t frameSize, bool jointStereo)
8593
: fp(fopen(filename.c_str(), "wb"))
94+
, FrameSize(frameSize)
95+
, FramesWritten(0)
8696
{
8797
if (!fp) {
8898
throw std::runtime_error("Cannot open file to write");
@@ -98,11 +108,14 @@ class TAt3 : public ICompressedOutput {
98108
}
99109

100110
memcpy(header.riff_chunk_id, "RIFF", 4);
101-
header.chunk_size = swapbyte32_on_be(file_size);
111+
// RIFF spec: chunk_size is the size of everything after this field,
112+
// i.e. file_size - 8 (RIFF marker + size field itself).
113+
header.chunk_size = swapbyte32_on_be(file_size - 8);
102114
memcpy(header.riff_format, "WAVE", 4);
103115

104116
memcpy(header.subchunk1_id, "fmt ", 4);
105-
header.subchunk1_size = swapbyte32_on_be(offsetof(struct At3WaveHeader, subchunk2_id) -
117+
// fmt chunk ends where the next chunk ("fact") begins.
118+
header.subchunk1_size = swapbyte32_on_be(offsetof(struct At3WaveHeader, fact_id) -
106119
offsetof(struct At3WaveHeader, audio_format));
107120

108121
// libnetmd: #define NETMD_RIFF_FORMAT_TAG_ATRAC3 0x270
@@ -114,16 +127,23 @@ class TAt3 : public ICompressedOutput {
114127
header.byte_rate = swapbyte32_on_be(frameSize * header.sample_rate / 1024);
115128
header.block_align = swapbyte16_on_be(frameSize);
116129
header.bits_per_sample = swapbyte16_on_be(0);
117-
header.extradata_size = swapbyte16_on_be(offsetof(struct At3WaveHeader, subchunk2_id) -
130+
header.extradata_size = swapbyte16_on_be(offsetof(struct At3WaveHeader, fact_id) -
118131
offsetof(struct At3WaveHeader, unknown0));
119132

120133
header.unknown0 = swapbyte16_on_be(1);
121-
header.bytes_per_frame = swapbyte32_on_be(0x0010); // XXX
134+
// 1024 samples × 2 channels × 2 bytes = 4096 (0x1000). Sony's encoder
135+
// writes this value; PSP tool and ffmpeg rely on it for frame sizing.
136+
header.bytes_per_frame = swapbyte32_on_be(0x1000);
122137
header.coding_mode = swapbyte16_on_be(jointStereo ? 0x0001 : 0x0000);
123138
header.coding_mode2 = header.coding_mode; // already byte-swapped (if needed)
124139
header.unknown1 = swapbyte16_on_be(1);
125140
header.unknown2 = swapbyte16_on_be(0);
126141

142+
memcpy(header.fact_id, "fact", 4);
143+
header.fact_size = swapbyte32_on_be(8);
144+
header.total_samples = swapbyte32_on_be(uint32_t(numFrames) * 1024);
145+
header.samples_per_frame = swapbyte32_on_be(1024);
146+
127147
memcpy(header.subchunk2_id, "data", 4);
128148
header.subchunk2_size = swapbyte32_on_be(numFrames * frameSize); // TODO
129149

@@ -133,13 +153,36 @@ class TAt3 : public ICompressedOutput {
133153
}
134154

135155
virtual ~TAt3() override {
156+
// The PCM engine can flush more frames than initially estimated
157+
// (encoder look-ahead tail). Backfill the length fields so
158+
// RIFF chunk_size, fact total_samples, and data subchunk_size
159+
// reflect the actual frame count on disk.
160+
if (FramesWritten > 0) {
161+
const uint64_t actualFileSize = sizeof(struct At3WaveHeader) +
162+
uint64_t(FramesWritten) * uint64_t(FrameSize);
163+
if (actualFileSize < UINT32_MAX) {
164+
const uint32_t chunkSize = uint32_t(actualFileSize - 8);
165+
const uint32_t totalSamples = uint32_t(FramesWritten) * 1024u;
166+
const uint32_t dataSize = uint32_t(FramesWritten) * FrameSize;
167+
const uint32_t chunkSizeLE = swapbyte32_on_be(chunkSize);
168+
const uint32_t totalSamplesLE = swapbyte32_on_be(totalSamples);
169+
const uint32_t dataSizeLE = swapbyte32_on_be(dataSize);
170+
fseek(fp, offsetof(struct At3WaveHeader, chunk_size), SEEK_SET);
171+
fwrite(&chunkSizeLE, sizeof(uint32_t), 1, fp);
172+
fseek(fp, offsetof(struct At3WaveHeader, total_samples), SEEK_SET);
173+
fwrite(&totalSamplesLE, sizeof(uint32_t), 1, fp);
174+
fseek(fp, offsetof(struct At3WaveHeader, subchunk2_size), SEEK_SET);
175+
fwrite(&dataSizeLE, sizeof(uint32_t), 1, fp);
176+
}
177+
}
136178
fclose(fp);
137179
}
138180

139181
virtual void WriteFrame(std::vector<char> data) override {
140182
if (fwrite(data.data(), 1, data.size(), fp) != data.size()) {
141183
throw std::runtime_error("Cannot write AT3 data to file");
142184
}
185+
++FramesWritten;
143186
}
144187

145188
std::string GetName() const override {
@@ -152,6 +195,8 @@ class TAt3 : public ICompressedOutput {
152195

153196
private:
154197
FILE *fp;
198+
uint32_t FrameSize;
199+
uint64_t FramesWritten;
155200
};
156201

157202
} //namespace

0 commit comments

Comments
 (0)