Skip to content

net: 512 byte DNS response size limit causes "cannot unmarshal DNS" error #51127

Closed
@AaronFriel

Description

@AaronFriel

So, you found this issue googling for "cannot unmarshal DNS"

There's good news: your issue has largely been fixed. The issue below was created initially because I discovered it in my network and operating system, but further discovery found that this issue has affected every major OS and users of VPNs, DNS providers written in Go, and more.

If you are a maintainer of code and someone has reported this issue: if you can update your build system to use Go 1.16.15 or 1.17.8, or Go 1.18, then you should see this go away and solve your users' issues.

If you are a user of a program and see this error, you need to ask the maintainer or creator of that package to do likewise. Unfortunately, there isn't a single set of instructions I can give for a workaround. If you're using a VPN, try using that program not on a VPN; that seems to be the most common user-reported scenario I've seen.


Original bug report:

What version of Go are you using (go version)?

$ go version
go version go1.17.6 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Note: WSL2 on Windows. This is relevant, but not the sole scenario in which it can occur, see below.

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/friel/.cache/go-build"
GOENV="/home/friel/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/friel/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/friel/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/friel/.local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/friel/.local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.6"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/friel/go/src/github.com/pulumi/pulumi-yaml/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3112884807=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Use infrastructure as code tools to manage Azure, and/or attempt to execute net.LookupIP("management.azure.com").

Example program:

package main

import (
	"fmt"
	"net"
)

func main() {
	ips, err := net.LookupIP("management.azure.com")
	if err != nil {
		panic(err)
	}
	for _, ip := range ips {
		fmt.Printf("%v", ip)
	}
}

What did you expect to see?

I expected to see the current IP, 13.86.219.80, as shown by the last line of:

$ host management.azure.com
management.azure.com is an alias for management.privatelink.azure.com.
management.privatelink.azure.com is an alias for arm-frontdoor-prod.trafficmanager.net.
arm-frontdoor-prod.trafficmanager.net is an alias for westus.management.azure.com.
westus.management.azure.com is an alias for arm-frontdoor-westus.trafficmanager.net.
arm-frontdoor-westus.trafficmanager.net is an alias for westus.cs.management.azure.com.
westus.cs.management.azure.com is an alias for rpfd-prod-by-01.cloudapp.net.
rpfd-prod-by-01.cloudapp.net has address 13.86.219.80

What did you see instead?

$ go run resolve-test.go 
panic: lookup management.azure.com on 172.20.32.1:53: cannot unmarshal DNS message

goroutine 1 [running]:
main.main()
        /home/friel/c/resolve-test/resolve-test.go:11 +0xe8
exit status 2

Miscellany

It looks like this issue is widely affecting infrastructure as code tools such as Pulumi, Terraform, and others when they make API calls to Microsoft Azure on the Windows Subsystem for Linux 2, on Microsoft Windows.

This is a bit of a rock and a hard place situation. Microsoft is unlikely to update their DNS server to adhere to the pre-1999 DNS specification. The Go language team is in a position to be much more agile and issue a point release update to support a larger buffer size, even just going up to a single standard MTU of ~1500 bytes would resolve this issue in the near term.

As this problem primarily affects programs written in Go, in this author's estimation it seems unlikely a change in Windows' DNS server behavior could occur as quickly, even if the stars were to align on the need to change the implementation. Note that host, dig, nslookup, etc all behave correctly.

Collected notes and root cause analysis:

DNS Flag Day 2020 had an explicit goal of ensuring that resolvers had a minimum accepted buffer size of 1232 bytes: https://dnsflagday.net/2020/#action-dns-resolver-operators

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions