Description
- Gitea version (or commit ref): Docker image - "latest" tag
- Git version: 2a9b8d1 - from docker image's metadata
- Operating system: Ubuntu Trusty, running Gitea in Docker 18 with the public
gitea/gitea
image. - Database (use
[x]
):- SQLite
- Can you reproduce the bug at https://try.gitea.io:
- No
- Log gist:
I've had this happen a few times but not yet with debug logging enabled. It's on now. Will update. I didn't see anything out-of-the-ordinary in the default log that I had.
I did however get a trace of all the goroutines: https://gist.github.com/dpedu/86b729acd51d2132950328a4040e0092
Description
I run a web facing (not directly, it's behind Nginx and Varnish) copy of Gitea that I alone am the user of. After 2 - 3 days of uptime, the Gitea instance seems to stop responding to HTTP requests. The reverse proxy in front of it will show a 504 timeout error.
I can connect to the Gitea instance directly with my browser and while it accepts my connection it never responds to http requests; the problem appears the same with the reverse proxies between Gitea and I removed.
The ssh interface still works fine - I can clone, push, pull, etc.
Looking at the log - Gitea is logging as if it is still serving normal traffic. It logs lines like Started GET
or Completed GET
as normal but through inspecting the traffic in my reverse proxy, it's actually not replying.
Looking at the goroutine trace above, it looks like there are very many gorountines trying to acquire some lock in the template layer. More than 300! My instance was not using much CPU at all when viewed in htop
in the bad state, it seemed like something was locked up rather than overloaded.
In ps
, there were several zombie git
processes that were children of gitea
. Not sure if this is a cause of a result of the other problem:
root 25150 7788 25150 7788 0 May08 ? 00:00:17 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/0e642201efc1eeb1a8daa108d88efa2b469d915524496bf4278c7660c62dfed9 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
root 25168 25150 25168 25168 0 May08 ? 00:00:00 /bin/s6-svscan /etc/s6
root 25219 25168 25168 25168 0 May08 ? 00:00:00 s6-supervise gitea
1000 25221 25219 25221 25221 4 May08 ? 02:13:53 /app/gitea/gitea web
1000 13231 25221 25221 25221 0 19:58 ? 00:00:00 [git] <defunct> <---- this line repeated 16x
root 25220 25168 25168 25168 0 May08 ? 00:00:00 s6-supervise openssh
root 25222 25220 25222 25222 0 May08 ? 00:00:00 sshd: /usr/sbin/sshd -D -e [listener] 0 of 10-100 startups
Here's lsof -p
output for Gitea as well. I snipped out the "normal" stuff like my sqlite database. What was left was about 400 of the TYPE=sock
lines, with about 10x fewer TYPE=FIFO
lines spinkled in.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
gitea 25221 1000 84u sock 0,8 0t0 226911955 can't identify protocol
gitea 25221 1000 85u sock 0,8 0t0 226911157 can't identify protocol
gitea 25221 1000 86r FIFO 0,10 0t0 226911175 pipe
gitea 25221 1000 87u sock 0,8 0t0 226911183 can't identify protocol
gitea 25221 1000 88u sock 0,8 0t0 226911203 can't identify protocol
gitea 25221 1000 89u sock 0,8 0t0 226912114 can't identify protocol
gitea 25221 1000 90u sock 0,8 0t0 226912271 can't identify protocol
gitea 25221 1000 91r FIFO 0,10 0t0 226935531 pipe
gitea 25221 1000 92u sock 0,8 0t0 226940493 can't identify protocol
gitea 25221 1000 93u sock 0,8 0t0 226940498 can't identify protocol
When I restart my gitea container everything is back to normal afterwards.
I first remember encounter this problem in mid April. I pull the latest docker image each time I've had to manually restart it, so it has appeared in more than one version.
There are other http-based services running on the same machine that don't show similar issues.
I have lots and lots of code checked into public repos so the instance does attract quite a bit of bot traffic scraping the html. Sometimes I have to user-agent ban ones that are too aggressive and push the CPU usage too much.
Screenshots
N/A