Skip to content

Gitea stops responding on HTTP #15826

Closed
@dpedu

Description

@dpedu
  • Gitea version (or commit ref): Docker image - "latest" tag
  • Git version: 2a9b8d1 - from docker image's metadata
  • Operating system: Ubuntu Trusty, running Gitea in Docker 18 with the public gitea/gitea image.
  • Database (use [x]):
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • No
  • Log gist:
    I've had this happen a few times but not yet with debug logging enabled. It's on now. Will update. I didn't see anything out-of-the-ordinary in the default log that I had.

I did however get a trace of all the goroutines: https://gist.github.com/dpedu/86b729acd51d2132950328a4040e0092

Description

I run a web facing (not directly, it's behind Nginx and Varnish) copy of Gitea that I alone am the user of. After 2 - 3 days of uptime, the Gitea instance seems to stop responding to HTTP requests. The reverse proxy in front of it will show a 504 timeout error.

I can connect to the Gitea instance directly with my browser and while it accepts my connection it never responds to http requests; the problem appears the same with the reverse proxies between Gitea and I removed.

The ssh interface still works fine - I can clone, push, pull, etc.

Looking at the log - Gitea is logging as if it is still serving normal traffic. It logs lines like Started GET or Completed GET as normal but through inspecting the traffic in my reverse proxy, it's actually not replying.

Looking at the goroutine trace above, it looks like there are very many gorountines trying to acquire some lock in the template layer. More than 300! My instance was not using much CPU at all when viewed in htop in the bad state, it seemed like something was locked up rather than overloaded.

In ps, there were several zombie git processes that were children of gitea. Not sure if this is a cause of a result of the other problem:

root     25150  7788 25150  7788  0 May08 ?        00:00:17       docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/0e642201efc1eeb1a8daa108d88efa2b469d915524496bf4278c7660c62dfed9 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
root     25168 25150 25168 25168  0 May08 ?        00:00:00         /bin/s6-svscan /etc/s6
root     25219 25168 25168 25168  0 May08 ?        00:00:00           s6-supervise gitea
1000     25221 25219 25221 25221  4 May08 ?        02:13:53             /app/gitea/gitea web
1000     13231 25221 25221 25221  0 19:58 ?        00:00:00               [git] <defunct>               <---- this line repeated 16x
root     25220 25168 25168 25168  0 May08 ?        00:00:00           s6-supervise openssh
root     25222 25220 25222 25222  0 May08 ?        00:00:00             sshd: /usr/sbin/sshd -D -e [listener] 0 of 10-100 startups

Here's lsof -p output for Gitea as well. I snipped out the "normal" stuff like my sqlite database. What was left was about 400 of the TYPE=sock lines, with about 10x fewer TYPE=FIFO lines spinkled in.

COMMAND   PID     USER   FD   TYPE DEVICE  SIZE/OFF      NODE NAME
gitea   25221     1000   84u  sock    0,8       0t0 226911955 can't identify protocol
gitea   25221     1000   85u  sock    0,8       0t0 226911157 can't identify protocol
gitea   25221     1000   86r  FIFO   0,10       0t0 226911175 pipe
gitea   25221     1000   87u  sock    0,8       0t0 226911183 can't identify protocol
gitea   25221     1000   88u  sock    0,8       0t0 226911203 can't identify protocol
gitea   25221     1000   89u  sock    0,8       0t0 226912114 can't identify protocol
gitea   25221     1000   90u  sock    0,8       0t0 226912271 can't identify protocol
gitea   25221     1000   91r  FIFO   0,10       0t0 226935531 pipe
gitea   25221     1000   92u  sock    0,8       0t0 226940493 can't identify protocol
gitea   25221     1000   93u  sock    0,8       0t0 226940498 can't identify protocol

When I restart my gitea container everything is back to normal afterwards.

I first remember encounter this problem in mid April. I pull the latest docker image each time I've had to manually restart it, so it has appeared in more than one version.

There are other http-based services running on the same machine that don't show similar issues.

I have lots and lots of code checked into public repos so the instance does attract quite a bit of bot traffic scraping the html. Sometimes I have to user-agent ban ones that are too aggressive and push the CPU usage too much.

Screenshots

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions