Skip to content

fix: start the plugin daemon after the database has become healthy #17928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2025

Conversation

kurokobo
Copy link
Contributor

@kurokobo kurokobo commented Apr 12, 2025

Summary

This PR Changes following:

  • Update plugin_daemon.depends_on to have db.condition.service_healthy to wait for the database to be healthy before plugin_daemon is started
  • Update db.healthcheck to ensure that external connections are possible using the hostname (-h), username (-U), and database name (-d).
  • Increase retries from 30 to 60 since it will take a little longer than before to be judged as healthy.

Closes #17927

Screenshots

Before

$ docker inspect docker-plugin_daemon-1 | jq '.[0].RestartCount'

✨ Restarts 7 times before it works normally
7
$ docker compose logs plugin_daemon

✨ The 1st attempt
plugin_daemon-1  | 2025/04/12 08:36:03 pool.go:32: [INFO]init routine pool, size: 10000
plugin_daemon-1  | 
plugin_daemon-1  | 2025/04/12 08:36:03 /app/internal/db/pg/pg.go:14
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=dify_plugin`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  |
plugin_daemon-1  | 2025/04/12 08:36:03 /app/internal/db/pg/pg.go:18
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  | 2025/04/12 08:36:03 init.go:85: [PANIC]failed to init dify plugin db: failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  | panic: [PANIC]failed to init dify plugin db: failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  |
plugin_daemon-1  | goroutine 1 [running]:
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/utils/log.writeLog({0x1857285, 0x5}, {0x1888432?, 0x7?}, 0x1, {0xc00012fce0, 0x1, 0x1})
plugin_daemon-1  |      /app/internal/utils/log/log.go:40 +0x305
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/utils/log.Panic(...)
plugin_daemon-1  |      /app/internal/utils/log/log.go:66
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/db.Init(0x2710?)
plugin_daemon-1  |      /app/internal/db/init.go:85 +0x285
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/server.(*App).Run(0xc0001b4228, 0xc0002e7008)
plugin_daemon-1  |      /app/internal/server/server.go:74 +0xd2
plugin_daemon-1  | main.main()
plugin_daemon-1  |      /app/cmd/server/main.go:28 +0x125

✨ The 2nd attempt
plugin_daemon-1  | 2025/04/12 08:36:05 pool.go:32: [INFO]init routine pool, size: 10000
plugin_daemon-1  |
plugin_daemon-1  | 2025/04/12 08:36:05 /app/internal/db/pg/pg.go:14
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=dify_plugin`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  |
plugin_daemon-1  | 2025/04/12 08:36:05 /app/internal/db/pg/pg.go:18
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  | 2025/04/12 08:36:05 init.go:85: [PANIC]failed to init dify plugin db: failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  | panic: [PANIC]failed to init dify plugin db: failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  |
plugin_daemon-1  | goroutine 1 [running]:
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/utils/log.writeLog({0x1857285, 0x5}, {0x1888432?, 0x7?}, 0x1, {0xc00041fce0, 0x1, 0x1})
plugin_daemon-1  |      /app/internal/utils/log/log.go:40 +0x305
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/utils/log.Panic(...)
plugin_daemon-1  |      /app/internal/utils/log/log.go:66
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/db.Init(0x2710?)
plugin_daemon-1  |      /app/internal/db/init.go:85 +0x285
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/server.(*App).Run(0xc0001b6378, 0xc00026f008)
plugin_daemon-1  |      /app/internal/server/server.go:74 +0xd2
plugin_daemon-1  | main.main()
plugin_daemon-1  |      /app/cmd/server/main.go:28 +0x125

✨ The 3rd attempt
plugin_daemon-1  | 2025/04/12 08:36:06 pool.go:32: [INFO]init routine pool, size: 10000
plugin_daemon-1  |
plugin_daemon-1  | 2025/04/12 08:36:06 /app/internal/db/pg/pg.go:14
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=dify_plugin`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  |
plugin_daemon-1  | 2025/04/12 08:36:06 /app/internal/db/pg/pg.go:18
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  | 2025/04/12 08:36:06 init.go:85: [PANIC]failed to init dify plugin db: failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  | panic: [PANIC]failed to init dify plugin db: failed to connect to `host=db user=postgres database=postgres`: dial error (dial tcp 172.18.0.4:5432: connect: connection refused)
plugin_daemon-1  |
plugin_daemon-1  | goroutine 1 [running]:
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/utils/log.writeLog({0x1857285, 0x5}, {0x1888432?, 0x7?}, 0x1, {0xc000513ce0, 0x1, 0x1})
plugin_daemon-1  |      /app/internal/utils/log/log.go:40 +0x305
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/utils/log.Panic(...)
plugin_daemon-1  |      /app/internal/utils/log/log.go:66
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/db.Init(0x2710?)
plugin_daemon-1  |      /app/internal/db/init.go:85 +0x285
plugin_daemon-1  | github.com/langgenius/dify-plugin-daemon/internal/server.(*App).Run(0xc000114b70, 0xc0001df008)
plugin_daemon-1  |      /app/internal/server/server.go:74 +0xd2
plugin_daemon-1  | main.main()
plugin_daemon-1  |      /app/cmd/server/main.go:28 +0x125

...
✨ Finally started normally
plugin_daemon-1  | 2025/04/12 08:36:20 pool.go:32: [INFO]init routine pool, size: 10000
plugin_daemon-1  |
plugin_daemon-1  | 2025/04/12 08:36:20 /app/internal/db/pg/pg.go:14
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=dify_plugin`: server error (FATAL: database "dify_plugin" does not exist (SQLSTATE 3D000))     
plugin_daemon-1  | 2025/04/12 08:36:22 init.go:93: [INFO]dify plugin db initialized
plugin_daemon-1  | 2025/04/12 08:36:22 manager.go:159: [INFO]start plugin manager daemon...
plugin_daemon-1  | 2025/04/12 08:36:22 watcher.go:16: [INFO]start to handle new plugins in path: plugin
plugin_daemon-1  | 2025/04/12 08:36:22 init.go:19: [INFO]Persistence initialized
plugin_daemon-1  | [gnet] 2025-04-12T08:36:22.73824398Z INFO    logging/logger.go:256   Launching gnet with 8 event-loops, listening on: tcp://0.0.0.0:5003
plugin_daemon-1  | 2025/04/12 08:36:23 cluster_lifetime.go:113: [INFO]current node has become the master of the cluster

After

$ docker inspect docker-plugin_daemon-1 | jq '.[0].RestartCount'

✨ Never be restarted
0
$ docker compose logs plugin_daemon

✨ Started nomally on the 1st attempt, without any panics
plugin_daemon-1  | 2025/04/12 09:02:39 pool.go:32: [INFO]init routine pool, size: 10000
plugin_daemon-1  | 
plugin_daemon-1  | 2025/04/12 09:02:39 /app/internal/db/pg/pg.go:14
plugin_daemon-1  | [error] failed to initialize database, got error failed to connect to `host=db user=postgres database=dify_plugin`: server error (FATAL: database "dify_plugin" does not exist (SQLSTATE 3D000))     
plugin_daemon-1  | 2025/04/12 09:02:41 init.go:93: [INFO]dify plugin db initialized
plugin_daemon-1  | 2025/04/12 09:02:41 manager.go:159: [INFO]start plugin manager daemon...
plugin_daemon-1  | 2025/04/12 09:02:41 init.go:19: [INFO]Persistence initialized
plugin_daemon-1  | 2025/04/12 09:02:41 watcher.go:16: [INFO]start to handle new plugins in path: plugin
plugin_daemon-1  | [gnet] 2025-04-12T09:02:41.279542923Z        INFO    logging/logger.go:256   Launching gnet with 8 event-loops, listening on: tcp://0.0.0.0:5003
plugin_daemon-1  | 2025/04/12 09:02:41 cluster_lifetime.go:113: [INFO]current node has become the master of the cluster

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working labels Apr 12, 2025
@jasonfish568
Copy link
Contributor

This is truly the issue that some other issues have also mentioned that the plugin daemon could not connect to the db.

@kurokobo
Copy link
Contributor Author

This PR does not resolve the issue with the DB connection; it simply suppresses errors that can be ignored.
However, it certainly makes it easier to differentiate between an actual issue with DB connection and merely insufficient wait time :)

@jasonfish568
Copy link
Contributor

Yes. That’s what I thought. I don’t think this is actually an error but just the start up order needs to be enforced.

@jasonfish568
Copy link
Contributor

This error was mentioned a few times, @crazywoola this is the issue that you said you couldn’t replicate. It’s not a bug but the start up order. The daemon started before the db gets ready. I believe the daemon retries and eventually does get connected to the db.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 13, 2025
@crazywoola crazywoola merged commit bc57fa0 into langgenius:main Apr 13, 2025
8 checks passed
@kurokobo kurokobo deleted the daemon_depends_on_db branch April 13, 2025 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The plugin_daemon container should be started after the database becomes healthy
3 participants