a from-scratch jpeg encoder and decoder written in go. no external dependencies, just the standard library.
built this while writing a paper on image compression - wanted to really understand what's happening under the hood. ai did the heavy lifting:)
jpeg-go/
├── main.go # example program
├── go.mod
└── jpeg/
├── constants.go # markers, quant tables, huffman tables
├── color.go # rgb to ycbcr, subsampling
├── dct.go # discrete cosine transform
├── quantization.go # quantization, zigzag, rle
├── huffman.go # huffman coding
├── encoder.go # jpeg encoder
└── decoder.go # jpeg decoder
package main
import (
"os"
"image/png"
"jpeg-go/jpeg"
)
func main() {
file, _ := os.Open("input.png")
img, _ := png.Decode(file)
file.Close()
output, _ := os.Create("output.jpg")
encoder := jpeg.NewEncoder(75) // quality 1-100
encoder.Encode(output, img)
output.Close()
}package main
import (
"os"
"image/png"
"jpeg-go/jpeg"
)
func main() {
file, _ := os.Open("input.jpg")
img, _ := jpeg.DecodeImage(file)
file.Close()
output, _ := os.Create("output.png")
png.Encode(output, img)
output.Close()
}- 1-10: crunchy as hell, but tiny files
- 50: decent balance
- 75: the sweet spot for most stuff
- 90-100: basically lossless looking, bigger files
the whole point of jpeg is that humans are way better at seeing brightness differences than color differences. so we can throw away a lot of color info and nobody notices.
first we split the image into brightness (Y) and color (Cb, Cr). this lets us treat them separately.
since eyes don't care much about color resolution, we shrink Cb and Cr to 1/4 the size (half in each direction). that's already 50% smaller and you can barely tell.
the image gets chopped into 8x8 pixel blocks. each one is processed on its own.
each block goes through a discrete cosine transform. this converts pixel values into frequency components - basically "how much low frequency stuff vs high frequency stuff is in this block".
the cool thing is most of the important visual info ends up in the low frequencies (top-left of the transformed block). the high frequency stuff (bottom-right) is usually small values we can safely throw away.
this is where the actual compression happens. we divide all the dct values by numbers from a quantization table and round them. lots of the high-frequency values become zero.
lower quality = bigger divisors = more zeros = smaller file = more artifacts
we read the 8x8 block in a zigzag pattern starting from top-left. this groups all the zeros (from the high frequencies) together at the end.
now we encode it as "5 zeros then a 12, 3 zeros then a -4, etc". way more compact than storing all those zeros individually.
finally, everything gets huffman encoded. common patterns get short codes, rare ones get longer codes. standard entropy compression stuff.
go build -o jpeg-compressor
./jpeg-compressorthis'll generate some test images at different quality levels so you can see the difference.
- ITU-T T.81 (the actual jpeg spec)
- JFIF spec
- IJG documentation
- this video got me interested in building this
