Score:0

MySQL - Select queries 10x slower on Azure VM vs on-prem VM

cn flag

We have been working on a project to migrate a MySQL database from an on-premise Linux server to a Windows VM on Azure (IaaS). (There's a specific reason because of which we have gone with the IaaS option instead of the Azure MySQL PaaS offering).

After the migration, we see that the queries on the MySQL database are significantly slower (about 10x) on the new server. The VM is configured with 64 CPUs and 256 GB RAM (the on-premise VM had 48 CPUs and 256 GB RAM).

All the tables in the database are using the InnoDB engine. We have read up quite a lot about query slowness with InnoDB tables, and most of it seems to be pointing to the innodb_buffer_pool_size - which I have already configured to 185 GB (roughly 70% of the total RAM). We have also tried making a number of other changes in the my.ini configuration like

key_buffer_size = 20MB
innodb_io_capacity = 2000
query_cache_size = 0
query_cache_type = 0
increasing thread_cache_size
innodb_read_io_threads
innodb_write_io_threads

Etc. But nothing seems to be helping, with the query performance.

We have compared the indexes on both the servers, and they're the same. And at a high level, it doesn't look like the indexes are broken on the Azure VM. Also, we are trying to measure the performance by running MySQL workbench inside the Azure VM, so network bandwidth should not be an issue.

  • Can anyone suggest any other options that we could try to improve the performance?

A couple of further points.

  • What we notice is that although some queries take 30+ minutes to run, (they seem to be running in the on-premise server in just 5 minutes), the CPU usage on the VM remains very low (less than 10%). Is there some setting like innodb_buffer_pool_size` to allocate some amount of CPU to the MySQL Server?

Like I mentioned before, the on-premise VM is Linux-based and the Azure VM is running on Windows - could that be a problem? I can't find any definitive proof that MySQL on Windows will cause such severe performance degradation.

My full my.ini configuration is below :

# Other default tuning values
# MySQL Server Instance Configuration File
# ---------------------------------------------
# Generated by the MySQL Server Instance Configuration Wizard
#
#
# Installation Instructions
# ---------------------------------------------
#
# On Linux you can copy this file to /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options
# (@localstatedir@ for this installation) or to
# ~/.my.cnf to set user-specific options.
#
# On Windows you should keep this file in the installation directory 
# of your server (e.g. C:\Program Files\MySQL\MySQL Server X.Y). To
# make sure the server reads the config file use the startup option 
# "--defaults-file". 
#
# To run the server from the command line, execute this in a 
# command line shell, e.g.
# mysqld --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini"
#
# To install the server as a Windows service manually, execute this in a 
# command line shell, e.g.
# mysqld --install MySQLXY --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini"
#
# And then execute this in a command line shell to start the server, e.g.
# net start MySQLXY
#
#
# Guidelines for editing this file
# ---------------------------------------------
#
# In this file, you can use all long options that the program supports.
# If you want to know the options a program supports, start the program
# with the "--help" option.
#
# More detailed information about the individual options can also be
# found in the manual.
#
# For advice on how to change settings please see
# https://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html
#
#
# CLIENT SECTION
# ---------------------------------------------
#
# The following options will be read by MySQL client applications.
# Note that only client applications shipped by MySQL are guaranteed
# to read this section. If you want your own MySQL client program to
# honor these values, you need to specify it as an option during the
# MySQL client library initialization.
#
[client]

# pipe=

# socket=MYSQL

port=3306

[mysql]
no-beep

# default-character-set=

# SERVER SECTION
# ---------------------------------------------
#
# The following options will be read by the MySQL Server. Make sure that
# you have installed the server correctly (see above) so it reads this 
# file.
#
# server_type=1
[mysqld]
plugin-load-add=validate_password.dll
validate-password=FORCE_PLUS_PERMANENT
# The next three options are mutually exclusive to SERVER_PORT below.
# skip-networking
# enable-named-pipe
# shared-memory

# shared-memory-base-name=MYSQL

# The Pipe the MySQL Server will use
# socket=MYSQL

# The TCP/IP Port the MySQL Server will listen on
port=3306

# Path to installation directory. All paths are usually resolved relative to this.
# basedir="C:/Program Files/MySQL/MySQL Server 5.7/"

# Path to the database root
datadir=F:\MySQL\Data

# The default character set that will be used when a new schema or table is
# created and no character set is defined
# character-set-server=

# The default storage engine that will be used when create new tables when
default-storage-engine=INNODB

# Set the SQL mode to strict
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

# General and Slow logging.
log-output=FILE

general-log=0

general_log_file="ZEUWPJIRA01.log"

slow-query-log=1

slow_query_log_file="ZEUWPJIRA01-slow.log"

long_query_time=10

# Error Logging.
log-error="ZEUWPJIRA01.err"

# ***** Group Replication Related *****
# Specifies the base name to use for binary log files. With binary logging
# enabled, the server logs all statements that change data to the binary
# log, which is used for backup and replication.
# log-bin

# ***** Group Replication Related *****
# Specifies the server ID. For servers that are used in a replication topology,
# you must specify a unique server ID for each replication server, in the
# range from 1 to 2^32 − 1. “Unique” means that each ID must be different
# from every other ID in use by any other replication source or replica.
server-id=26
# ***** Group Replication Related *****
# The host name or IP address of the replica to be reported to the source
# during replica registration. This value appears in the output of SHOW SLAVE HOSTS
# on the source server. Leave the value unset if you do not want the replica to
# register itself with the source.
# report_host=0.0

# ***** Group Replication Related *****
# Defines the algorithm used to hash the writes extracted during a transaction. If you
# are using Group Replication, this variable must be set to XXHASH64 because the process
# of extracting the writes from a transaction is required for conflict detection on all
# group members.
# transaction_write_set_extraction=0.0
lower_case_table_names=1

# Secure File Priv.
secure-file-priv="C:/ProgramData/MySQL/MySQL Server 5.7/Uploads"

# The maximum amount of concurrent sessions the MySQL server will
# allow. One of these connections will be reserved for a user with
# SUPER privileges to allow the administrator to login even if the
# connection limit has been reached.
#max_connections=151
#[29062021]changed by Sridharan to increase max concurrent sessions
max_connections=1000
#[29062021]changed by Sridharan to increase max connection errors
max_connect_errors=500

# The number of open tables for all threads. Increasing this value
# increases the number of file descriptors that mysqld requires.
# Therefore you have to make sure to set the amount of open files
# allowed to at least 4096 in the variable "open-files-limit" in
# section [mysqld_safe]
#table_open_cache=2000
table_open_cache =2048
table_definition_cache = 2048
myisam_sort_buffer_size = 8M
#skip-external-locking
# Maximum size for internal (in-memory) temporary tables. If a table
# grows larger than this value, it is automatically converted to disk
# based table This limitation is for a single table. There can be many
# of them.
tmp_table_size=4G

# How many threads we should keep in a cache for reuse. When a client
# disconnects, the client's threads are put in the cache if there aren't
# more than thread_cache_size threads from before.  This greatly reduces
# the amount of thread creations needed if you have a lot of new
# connections. (Normally this doesn't give a notable performance
# improvement if you have a good thread implementation.)
# thread_cache_size=10
thread_cache_size=64
query_cache_size =0
query_cache_type = 0

#*** MyISAM Specific options
# The maximum size of the temporary file MySQL is allowed to use while
# recreating the index (during REPAIR, ALTER TABLE or LOAD DATA INFILE.
# If the file-size would be bigger than this, the index will be created
# through the key cache (which is slower).
myisam_max_sort_file_size=100G

# The size of the buffer that is allocated when sorting MyISAM indexes
# during a REPAIR TABLE or when creating indexes with CREATE INDEX
# or ALTER TABLE.
myisam_sort_buffer_size=6G

# Size of the Key Buffer, used to cache index blocks for MyISAM tables.
# Do not set it larger than 30% of your available memory, as some memory
# is also required by the OS to cache rows. Even if you're not using
# MyISAM tables, you should still set it to 8-64M as it will also be
# used for internal temporary disk tables.
key_buffer_size=20M

# Size of the buffer used for doing full table scans of MyISAM tables.
# Allocated per thread, if a full scan is needed.
read_buffer_size=64K

read_rnd_buffer_size=256K
tmp_table_size=64M
max_heap_table_size=64M

#*** INNODB Specific options ***
# innodb_data_home_dir=

# Use this option if you have a MySQL server with InnoDB support enabled
# but you do not plan to use it. This will save memory and disk space
# and speed up some things.
# skip-innodb

# If set to 1, InnoDB will flush (fsync) the transaction logs to the
# disk at each commit, which offers full ACID behavior. If you are
# willing to compromise this safety, and you are running small
# transactions, you may set this to 0 or 2 to reduce disk I/O to the
# logs. Value 0 means that the log is only written to the log file and
# the log file flushed to disk approximately once per second. Value 2
# means the log is written to the log file at each commit, but the log
# file is only flushed to disk approximately once per second.
innodb_flush_log_at_trx_commit=2

# The size of the buffer InnoDB uses for buffering log data. As soon as
# it is full, InnoDB will have to flush it to disk. As it is flushed
# once per second anyway, it does not make sense to have it very large
# (even with long transactions).
innodb_log_buffer_size=200M

# InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and
# row data. The bigger you set this the less disk I/O is needed to
# access data in tables. On a dedicated database server you may set this
# parameter up to 80% of the machine physical memory size. Do not set it
# too large, though, because competition of the physical memory may
# cause paging in the operating system.  Note that on 32bit systems you
# might be limited to 2-3.5G of user level memory per process, so do not
# set it too high.
#[29062021]changed by Sridharan to increase innodb buffer pool size from 16G to 24G
#[06072021]changed by Sridharan to increase innodb buffer pool size from 24G to 48G
innodb_buffer_pool_size=185G

# Size of each log file in a log group. You should set the combined size
# of log files to about 25%-100% of your buffer pool size to avoid
# unneeded buffer pool flush activity on log file overwrite. However,
# note that a larger logfile size will increase the time needed for the
# recovery process.
innodb_log_file_size=4G

# Number of threads allowed inside the InnoDB kernel. The optimal value
# depends highly on the application, hardware as well as the OS
# scheduler properties. A too high value may lead to thread thrashing.
#innodb_thread_concurrency=17
innodb_thread_concurrency=32

# The increment size (in MB) for extending the size of an auto-extend InnoDB system tablespace file when it becomes full.
innodb_autoextend_increment=64

# The number of regions that the InnoDB buffer pool is divided into.
# For systems with buffer pools in the multi-gigabyte range, dividing the buffer pool into separate instances can improve concurrency,
# by reducing contention as different threads read and write to cached pages.
#innodb_buffer_pool_instances=8
#[29062021 Sridharan] increased buffer pool instances from 8 to 12
innodb_buffer_pool_instances=12

#[29062021 Sridharan] added the io_capacity, read_io_threads and write_io_threads configurations to speed-up queries
innodb_read_io_threads = 32
innodb_write_io_threads = 32
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000

# Determines the number of threads that can enter InnoDB concurrently.
innodb_concurrency_tickets=5000

# Specifies how long in milliseconds (ms) a block inserted into the old sublist must stay there after its first access before
# it can be moved to the new sublist.
innodb_old_blocks_time=1000

# It specifies the maximum number of .ibd files that MySQL can keep open at one time. The minimum value is 10.
innodb_open_files=300

# When this variable is enabled, InnoDB updates statistics during metadata statements.
innodb_stats_on_metadata=0

# When innodb_file_per_table is enabled (the default in 5.6.6 and higher), InnoDB stores the data and indexes for each newly created table
# in a separate .ibd file, rather than in the system tablespace.
innodb_file_per_table=1

# Use the following list of values: 0 for crc32, 1 for strict_crc32, 2 for innodb, 3 for strict_innodb, 4 for none, 5 for strict_none.
innodb_checksum_algorithm=0

# The number of outstanding connection requests MySQL can have.
# This option is useful when the main MySQL thread gets many connection requests in a very short time.
# It then takes some time (although very little) for the main thread to check the connection and start a new thread.
# The back_log value indicates how many requests can be stacked during this short time before MySQL momentarily
# stops answering new requests.
# You need to increase this only if you expect a large number of connections in a short period of time.
back_log=80

# If this is set to a nonzero value, all tables are closed every flush_time seconds to free up resources and
# synchronize unflushed data to disk.
# This option is best used only on systems with minimal resources.
flush_time=0

# The minimum size of the buffer that is used for plain index scans, range index scans, and joins that do not use
# indexes and thus perform full table scans.
join_buffer_size=256K

# The maximum size of one packet or any generated or intermediate string, or any parameter sent by the
# mysql_stmt_send_long_data() C API function.
max_allowed_packet=1024M

# If more than this many successive connection requests from a host are interrupted without a successful connection,
# the server blocks that host from performing further connections.
max_connect_errors=100

# Changes the number of file descriptors available to mysqld.
# You should try increasing the value of this option if mysqld gives you the error "Too many open files".
open_files_limit=4161

# If you see many sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the
# sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization
# or improved indexing.
sort_buffer_size=256K

# The number of table definitions (from .frm files) that can be stored in the definition cache.
# If you use a large number of tables, you can create a large table definition cache to speed up opening of tables.
# The table definition cache takes less space and does not use file descriptors, unlike the normal table cache.
# The minimum and default values are both 400.
table_definition_cache=1400

# Specify the maximum size of a row-based binary log event, in bytes.
# Rows are grouped into events smaller than this size if possible. The value should be a multiple of 256.
binlog_row_event_max_size=8K

# If the value of this variable is greater than 0, a replica synchronizes its master.info file to disk.
# (using fdatasync()) after every sync_master_info events.
sync_master_info=10000

# If the value of this variable is greater than 0, the MySQL server synchronizes its relay log to disk.
# (using fdatasync()) after every sync_relay_log writes to the relay log.
sync_relay_log=10000

# If the value of this variable is greater than 0, a replica synchronizes its relay-log.info file to disk.
# (using fdatasync()) after every sync_relay_log_info transactions.
sync_relay_log_info=10000

# Load mysql plugins at start."plugin_x ; plugin_y".
# plugin_load

# The TCP/IP Port the MySQL Server X Protocol will listen on.
# loose_mysqlx_port=33060
# relay-log=C:/ProgramData/MySQL/MySQL Server 5.7/Data/mysql-relay-bin.log
relay-log=F:\MySQL\Data\mysql-relay-bin.log

# log_bin=C:/ProgramData/MySQL/MySQL Server 5.7/Data/mysql-bin.log
log_bin=F:\MySQL\Data\mysql-bin.log

binlog_do_db=jira_datacenter
binlog_format=MIXED
ua flag
Show us that 30-minute query, and the relevant `SHOW CREATE TABLE`. Plus how big is the table and how big is the resultset?
Sridharan Srinivasan avatar
cn flag
Thanks for the response, Rick. The 30-minute query is just an example. Even 6K records are taking about 2 mins to run on the new server. The 6K query runs in less than 10 seconds on the on-prem server. That is the reason I am not going into the specifics of queries/tables/indexes, because it looks like the problem is somewhere at the server level.
ua flag
In my experience, you can't "tune your way out of a performance problem". Meanwhile, there are many ways to write inefficient queries and/or fail to have adequate indexes. I realize the machines are are showing radically difference performances. But it would really help to see what the query is doing to see where to look.
Score:0
cn flag

I would suggest that you initially duplicate your existing on-prem, MySql configuration onto your Azure VM. All the same settings. That way, you're comparing like with like.
Right now, you have too many changes happening at the same time and you can't see which one(s) are causing the problem.

... we notice is that although some of the queries take 30+ minutes to run, (they seem to be running in the on-prem server in just 5 minutes) ...

Even 5 minutes is a long time for a query to run.
Can these queries be tuned, even in the on-prem instance?
Are they pulling large amounts of data across the network? Simplistically, your Azure VM is on the end of a longer piece of electronical string than your on-prem instance (and may be subject to all kinds of network routing, traffic management, etc.).

... the CPU usage on the VM remains very low ...

Again, that sounds like a lot of data being moved around with very little server-side processing involved.
Are the disks busy? Queries that do lots of updates or use lots of transient result sets will thrash the disks without using much CPU at all. (Conversely, lots of [in-memory] table-scanning usually evidences itself by high CPU load and low disk activity).

Sridharan Srinivasan avatar
cn flag
Thanks a lot for the response. We have compared the configurations between the my.ini on both servers, and ensured that we have set it up as close to the on-prem server as possible.
Sridharan Srinivasan avatar
cn flag
Thanks a lot for the response. We have compared the configurations between the my.ini on both servers, and ensured that we have set it up as close to the on-prem server as possible. And the query is taking 5 mins on-prem because they pull close to 4 million records. And this is just an example. Even 6K records are taking about 2 mins to run on the new server. And I am talking about the runtimes directly on the Azure VM (I mean, logging in to the Azure VM, so network bandwidth does not come into the picture). Disk - the read IOPS on the disk goes up to about 2K, but the disk can handle up to 7K
Score:0
kh flag
  • Pick one query where there's a significant difference between on-prem and Azure, and focus on that. It's likely that in resolving this one query you'll help resolve issues with all others (as here we're not talking about changing the query or adding indexes as you may if optimising the query, but rather we're looking at infrastructure differences).
  • Use a resource monitor to see where the limitting factors are. E.g. you said that CPU isn't high, so it's likely not that (unless the query's one that has to run single threaded, so is only impacting one of your logical cores and thus the average CPU is misleading; i.e. check all logical cores rather than average CPU).
  • A big difference with VMs is that disk isn't local to the compute resource, so this is a likely culprit.
    • For SQL Server I know you can host your temporary database files on the VM's D drive (the drive that's attached locally to the hypervisor; but gets wiped if you deallocate your VM, or it gets moved to a different hypervisor). This is OK since the temporary database doesn't contain anything we need to persist / can be recreated on startup so we don't lose anything. I'm don't know enough about MySQL to say for sure whether the same's true, but I believe it places its temp files in the location defined by the TMPDIR system environment variable; so you may want to try pointing that to the D drive to see if that helps (seek advice from an expert in MySQL to confirm whether this is acceptable; or test thoroughly including fully stopping (not just restarting) a POC VM with this setup to ensure all comes back up as expected)
    • Given you're taking a hit in IO due to the latency between the data disks and the VM, try to offset this by opting for disks with higher IOps; i.e. Premium SSD.
    • Consider amending your disk's host caching to ReadOnly. Again, here it's worth consulting a MySQL expert; certainly this helps SQL Server with which I'm more familiar; here our data disks would have host caching set to ReadOnly, whilst our log files would be written to disks with host caching set to None.
  • You also changed your OS; so there could be a lot of things going on here that you didn't have before.
    • Ensure you've set the relevant exclusions in your AV software for MySQL
    • Review what else is running on the VM; e.g. Windows comes with a load of services which may be set to run automatically but aren't required for your scenario. The Windows Search service used to be one such culprit where present.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.