Skip to content

[Issue] Rewrote Magento\Framework\DB\Adapter\Pdo\Mysql->isTableExists #29662

Closed
@m2-assistant

Description

@m2-assistant

This issue is automatically created based on existing pull request: #28516: Rewrote Magento\Framework\DB\Adapter\Pdo\Mysql->isTableExists


Preconditions

Magento 2.4-develop

Description

During heavier load on newer productive Magento2 installations (2.3 with MSI or 2.3.4+), we noticed increased execution time was spent in SHOW TABLE STATUS queries to MySQL. Analyzing this further, we found that these queries seemed to occur in higher frequency than in earlier Magento versions. This can be problematic, since the MySQL server uses an exclusive file system lock and a call to readdir() to answer them. Because of this, these calls can be slow. We also found that one web request can result in multiple calls of SHOW TABLE STATUS for the same tables, which seems very wasteful and inefficient.

In more detailed analysis of some of our customers, we found instances where SHOW TABLE STATUS queries were executed nearly 100 times when requesting a single category page with 48 products.

Looking at the code of the MySQL adapter, we identified several places that use the function showTablesStatus(), among them isTableExists().
With this patch, we eliminate the calls of SHOW TABLE STATUS for the common case of checking table existence.

Implementation details

In our implementation of isTableExists(), we use information from the INFORMATION_SCHEMA.TABLES table, which is provided without expensive file system operations and proved to be significantly faster in our tests (see below for details). Additionally, we use the already existing DDL cache to store a positive table existence, resulting in even fewer database calls.

According to the official MySQL documentation[link], using the views from INFORMATION_SCHEMA is actually recommended over using SHOW queries, since this has some advantages. INFORMATION_SCHEMA views are read-only and respect the access rights of the database user, so it should be readily available on supported MySQL platforms.

The raw performance advantages of using INFORMATION_SCHEMA over SHOW TABLES STATUS can easily be measured with MySQL itself. We’ve used MySQL 5.7.30 on Ubuntu 18.04, with datadir running on a high performance NVME SSD (Samsung PM1725b).
We’ve run 10 concurrent clients each executing 1,000,000 statements in a Magento database with demo data, once with the query SHOW TABLES STATUS LIKE ‘catalog_product_entity’; and once with our statement SELECT COUNT(1) AS tbl_exists FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = ‘catalog_product_entity’ AND TABLE_SCHEMA = DATABASE();.

Between these queries, we measured a difference of slightly over factor 10 in the execution time, favoring our solution with INFORMATION_SCHEMA. See below for details.
We assume the difference is even bigger when having more concurrent clients or slower hardware. As MySQL’s “SHOW TABLE STATUS” implementation is reading the whole directory, file system locking can have a stronger effect as well as which file system is used.

Our performance measurements:

  • create the files test1.sql and test2.sql with 1.000.000 rows of the statements to test.
  • run these commands
    • time seq 1 10 | parallel -j 10 bash -c 'mysql < test1.sql > /dev/null’
    • time seq 1 10 | parallel -j 10 bash -c 'mysql < test2.sql > /dev/null’

Results:
For SHOW TABLE STATUS:

# time seq 1 10 |  parallel  -j 10 bash -c ' mysql   < test1.sql  > /dev/null’ 

real	85m21.374s
user	3m5.500s
sys	1m40.317s

For INFORMATION_SCHEMA:

time seq 1 10 |  parallel  -j 10 bash -c ' mysql   < test2.sql > /dev/null 


real	8m5.184s
user	1m54.587s
sys	1m11.687s

Steps to reproduce:

To see the impact our patch has, load the same set of pages on the frontend with and without the patch on any Magento instance (with real or demo data). The difference can be seen when analyzing the amount of SQL queries sent to the database and the time spent processing them. Here’s an easy way to do this:

Using Slow Query Log, you can make MySQL actually log all queries by setting the threshold to 0 seconds:

SET @@GLOBAL.long_query_time = 0; SET @@GLOBAL.slow_query_log = 1;

After you’ve opened the selected set of frontend pages, deactivate it again:

SET @@GLOBAL.slow_query_log = 0;

Now, using pt-query-digest from the Percona Toolkit on the slow query log file, you get an analysis of the logged queries, with queries grouped by type, their average and cumulative run time, and other information. Make sure to use two different log files so that they can easily be compared between M2 with and without our patch.

In the analysis, you will usually see SHOW TABLE STATUS queries appearing in the top queries (this depends a bit on the module configuration and of course on which and how many pages you accessed for this test), meaning they're taking quite a good percentage of total execution time.

Please see these extracts from the pt-query-digest output for an example, which we created when simply opening the single category page “Men > Tops” from the demo data (https://$BASEURL/men/tops-men.html?product_list_limit=36).

Actual Result:

Analysis of SQL queries run, unpatched Magento:

# Profile
# Rank Query ID                         Response time Calls R/Call V/M   I
# ==== ================================ ============= ===== ====== ===== =
#    1 0xAADD55611E492BA965AF2F067A0...  0.0099 34.3%    22 0.0004  0.00 SELECT catalog_product_index_eav search_tmp_?ed?d?dcc?b?_? cataloginventory_stock_status
#    2 0x7009A071F49236A84A57562D461...  0.0050 17.2%     1 0.0050  0.00 SELECT catalog_product_entity catalog_category_product_index_store? catalog_product_index_price review_entity_summary review_entity cataloginventory_stock_status search_tmp_?ed?d?dccc?_?
#    3 0x8085D806F3631D0D30FE5C20326...  0.0036 12.5%     3 0.0012  0.00 SHOW TABLE STATUS
#    4 0x13262D434B03958CB4F62EC706B...  0.0011  3.9%     1 0.0011  0.00 INSERT SELECT search_tmp_?ed?d?dcc?b?_? catalog_product_index_eav cataloginventory_stock_status catalog_category_product_index_store?
#    5 0x2F9EDAB00084ED0A7EE1463DBA4...  0.0009  3.1%    72 0.0000  0.00 SELECT inventory_stock_sales_channel
[...]

You can see that queries of the type SHOW TABLE STATUS are the third most expensive queries with 12.5% of the total execution time inside the database and 3 calls in total just for this single page load.

Expected Result:

Patched Magento, opening the same category page:

# Profile
# Rank Query ID                         Response time Calls R/Call V/M   I
# ==== ================================ ============= ===== ====== ===== =
#    1 0xAADD55611E492BA965AF2F067A0...  0.0105 29.9%    22 0.0005  0.00 SELECT catalog_product_index_eav search_tmp_?ed?cf?f?c?a?_? cataloginventory_stock_status
#    2 0x8EA4DF263A535F59EAC6ED75B20...  0.0073 20.7%    24 0.0003  0.00 SELECT catalog_product_entity catalog_product_super_link catalog_product_website catalog_product_entity_int cataloginventory_stock_status
#    3 0x7009A071F49236A84A57562D461...  0.0050 14.2%     1 0.0050  0.00 SELECT catalog_product_entity catalog_category_product_index_store? catalog_product_index_price review_entity_summary review_entity cataloginventory_stock_status search_tmp_?ed?cf?f?e?b?_?
#    4 0x13262D434B03958CB4F62EC706B...  0.0015  4.3%     1 0.0015  0.00 INSERT SELECT search_tmp_?ed?cf?f?c?a?_? catalog_product_index_eav cataloginventory_stock_status catalog_category_product_index_store?
#    5 0xCB3E7FA2C499AC0B69ACCD06C21...  0.0012  3.4%     2 0.0006  0.00 SELECT catalog_product_index_price search_tmp_?ed?cf?f?c?a?_?
#    6 0x2F9EDAB00084ED0A7EE1463DBA4...  0.0009  2.7%    72 0.0000  0.00 SELECT inventory_stock_sales_channel
#    7 0x593763BE686057BEA87DB0BC232...  0.0008  2.4%    72 0.0000  0.00 SELECT inventory_stock_sales_channel
#    8 0x23B42F57B465E32E1732E4070E9...  0.0008  2.2%    72 0.0000  0.00 SELECT inventory_stock
#    9 0x11C6E1A6D7347EEAF29EBEA1350...  0.0008  2.2%     1 0.0008  0.00 SELECT catalog_product_index_eav cataloginventory_stock_status catalog_category_product_index_store?
#   10 0xA788DCC6FE9679C3FDABA74662A...  0.0007  1.9%     1 0.0007  0.00 SELECT catalog_product_entity catalog_category_product_index_store? catalog_product_index_price cataloginventory_stock_status search_tmp_?ed?cf?f?e?b?_?
#   11 0x961084F55005407CD1C1E76C0BE...  0.0006  1.7%     2 0.0003  0.00 SELECT eav_attribute_option_swatch
#   12 0xE29FDCAD0B27AFD15D17697A415...  0.0005  1.5%     1 0.0005  0.00 SELECT catalog_product_index_price search_tmp_?ed?cf?f?c?a?_?
#   13 0x233B56CFFF59C63BF02E08D9654...  0.0005  1.4%     2 0.0002  0.00 CREATE TABLE search_tmp_?ed?cf?f?e?b?_? `search_tmp_5ed8cf6f89e2b9_26325963`
#   14 0x2F0A3214F96ED5CD4E962B1E990...  0.0005  1.3%    24 0.0000  0.00 SELECT catalog_product_super_attribute catalog_product_entity catalog_product_super_link eav_attribute catalog_product_entity catalog_product_entity_int catalog_product_super_attribute_label eav_attribute_option
#   15 0xDD45A442CF95F62A11D72D7D0F8...  0.0004  1.2%     1 0.0004  0.00 SELECT catalog_category_product_index_store? search_tmp_?ed?cf?f?c?a?_? catalog_category_entity
#   16 0x616C07D9410BE0BBA2EBCD6D77C...  0.0003  0.8%    21 0.0000  0.00 SELECT eav_attribute
#   17 0xB3E930152F4418A1C656779CE44...  0.0003  0.7%    12 0.0000  0.00 SELECT catalog_product_super_attribute
#   18 0xDFE08A11F72783E309DFA58BA8F...  0.0003  0.7%    12 0.0000  0.00 SELECT catalog_category_product
#   19 0x58EA3033F7E981C7FACDF0DE8D3...  0.0002  0.7%    20 0.0000  0.00 SELECT catalog_eav_attribute
#   20 0x13867189A41845FE8E8C4366F7B...  0.0002  0.5%    12 0.0000  0.00 SELECT catalog_product_super_attribute_label
# MISC 0xMISC                            0.0020  5.6%    74 0.0000   0.0 <38 ITEMS>

We’ve had to include the full summary in this case to show that our implementation not only eliminates the common cause for SHOW TABLE STATUS queries, but also that the new queries to INFORMATION_SCHEMA.TABLES do not occur in the top 20 queries.
This already illustrates that we’ve found a good solution to improve the performance of table existence checks. The complete impact can be estimated from observing production stores with many concurrent page views, where we see thousands of SHOW TABLE STATUS queries in 2–3 minutes.

Questions or comments

  • The commit in this pull request is a combined work of @jonashrem and @leeps.
  • We haven’t created an issue for this improvement, but can do that if it’s needed/wanted.

Contribution checklist

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • All automated tests passed successfully (all builds are green)

Metadata

Metadata

Assignees

Labels

Component: DBFixed in 2.4.xThe issue has been fixed in 2.4-develop branchIssue: ConfirmedGate 3 Passed. Manual verification of the issue completed. Issue is confirmedIssue: Format is validGate 1 Passed. Automatic verification of issue format passedIssue: Ready for WorkGate 4. Acknowledged. Issue is added to backlog and ready for developmentPriority: P2A defect with this priority could have functionality issues which are not to expectations.Progress: doneReported on 2.3.4Indicates original Magento version for the Issue report.Reproduced on 2.4.xThe issue has been reproduced on latest 2.4-develop branchSeverity: S2Major restrictions or short-term circumventions are required until a fix is available.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions