Troubleshooting

Troubleshooting is a process of identifying and solving a problem. It begins by answering the following questions:

  • What are the symptoms?
  • Who or what is reporting the problem?
  • Where is the problem happening?
  • Is it specific to a platform, environment, application, or something else?
  • When does the problem happen?
  • Does it happen at a specific time, multiple times, or under specific conditions?
  • Under which conditions does the problem happen?
  • Is the problem reproducible?

Common Issues


The most common problems encountered with databases are typically caused by one or more of the following:

Poor Performance

  • Poor performance can result in slow response to user queries or applications accessing the database.
  • Improper Configuration. Improperly configured clients, servers, or databases can cause a wide range of problems, including poor performance, crashes, errors, or even database corruption.
  • Poor Connectivity. Poor connectivity can cause poor performance, time outs, or a variety of errors when interacting with the database.
  • Poor performance is usually caused by high latency for disk reads and writes, slow processing time by the server, or a poor connection between the server and client.

Any of these could be rooted in one or more of the following:

Improper Configuration

  • Inadequate server hardware or configuration. For example, the server could be running out of disk space, memory, or processing power.
    • Increasing these resources on the database server is called vertical scaling.
    • Adding additional database partitions and shards, to improve performance is called horizontal scaling.
  • A poorly configured database may be operational but still unable to meet demand.
    • For example, it may need to allow more connections, or it may need changes to its buffering andindexing settings to keep up with queries and return results quickly.

Poor Connectivity

  • A slow or poor network connection, or limited bandwidth between the client and database can cause high latency and processing times.

Query & Application Logic

  • A poorly written database query or improper application logic (such as unnecessary locking of database objects) can also result in performance issues.

Improperly Configured Server

  • An improperly or sub-optimally configured client, server, or database can cause any number of problems that can manifest in many ways. For example: A user with an incorrectly configured client or incorrect driver might be unable to connect to the database.
  • A poorly or incorrectly configured server can reduce performance, cause time outs or any number of possible errors.
  • The database may need to be configured to allow more connections or other settings, like caching or indexing, and may need to be adjusted to correct problems or improve overall performance.
  • Server configuration also significantly impacts performance and operation. Some examples of things you might change or configure to improve performance or correct a problem include:
    • Add more physical RAM or increase the memory assigned to the server.

    • Add more physical disk space or assign additional disk space to the server.

    • Consider upgrading the CPU or assigning more processing power to the server.

    • Consider defragmenting the hard disk. Fragmented data degrades overall performance.

Improper Client Configuration

  • Common issues with client configuration can be caused by incorrect login credentials, an incorrect host name or IP address, or even a corrupt or outdated connection driver. To fix these problems, check the client’s driver configuration and verify the following:
  • The username and password specified in the connection settings are correct.
  • Be sure to verify that the client is also configured to use the correct type of authentication,such as Windows or SQL authentication, for example.
  • The connection configuration settings, such as IP address, host name, and server name,are correct.
  • The driver version for the database application is up to date and correct.

Storage Configuration

The configuration of the database is something you need to monitor and continually evaluate to ensure it meets demand. Some examples of configuration settings you might need to change or correct are:

  • Increase the number of allowed connections to the database to meet increasing demand.
  • Change database buffering to improve performance.
  • Change database indexing to improve performance.

Sometimes configuring the storage system appropriately can alleviate performance issues, for example

  • placing frequently accessed tables on a faster disk.
  • Bugs in operating systems or in RDBMS software can result in errors and server crashes, so
  • ensure you regularly apply software patches and security updates to guard against this.

Poor Connectivity

Poor connectivity between the client and the database server can cause a wide range of problems, including poor performance, error messages, or loss of function. Some of the most common connectivity problems are typically caused by one of the following:

  • The database server cannot be reached or is not running properly.
  • The database instance on the server is not running properly.
  • The client login credentials are incorrect, or missing security settings such as for SSL connections.
  • The client configuration is incorrect.

Common Methods of Troubleshooting

Here are some common methods to help troubleshoot and solve basic connectivity issues.

  • Verify that the database server is running properly.
  • The exact procedure depends on your configuration and environment.
    • For example, you may need to physically check an on-premises server.
    • Or you may need to verify that a virtual machine in a cloud service is running.
  • Next, verify that the database instance on the server is running. This process varies depending on operating system and database.
    • For example, on a Windows-based system you could use the Task Manager to verify thatthe instance is running.
    • On a Db2 configuration, you could run db2cmd.exe and then issue commands in the command line.
  • Verify that the database can be reached from the client. A common method is to use the PING command from the client to communicate with the server’s IP address or host name.
  • Finally, verify that the client connection driver is configured correctly.
    • For example, make sure the user name and password for the connection are correct, and that other settings like IP address or host name, or security and encryption protocols are also correct.

Status Variables


When a database is experiencing a problem, you will first want to check the health and status of the database. All databases have a variety of commands and utilities that allow you to quickly get a snapshot of a database’s health and operational status. These utilities are accessed through either typed commands, or a graphical interface.

  • For example, from a command line in a UNIX environment, you might use the SERVICE MYSQL STATUS command to view MySQL status: And the output might indicate that the database server status is running or is OK

  • Available commands and syntax vary with the database you are using, such as Db2, MySQL, or PostgreSQL, and the environment in which you are using the database, such as UNIX or Windows.

    • For example, in Db2 you can run the db2pd command to monitor the status of a Db2 instance and run problem diagnostics.
    • For a database using MySQL you might use the SHOW STATUS command to get server status information.
    • And on a server running PostgreSQL, you might use the PG_ISREADY command to check the connection status of a PostgreSQL database server.

Databases use many status variables to provide information about their operation, and status variables can be either GLOBAL or SESSION based.

A GLOBAL status variable may represent status for some aspect of the server itself (for example, Aborted_connects), or the aggregated status over all connections to MySQL (for example, Bytes_received and Bytes_sent).

If a variable has no global value, the session value is displayed. A SESSION status variable, sometimes called a LOCAL status variable, represents values for the current connection. You can also use a LIKE clause and pattern with a SHOW SESSION statement to show status variable information that matches a specific pattern.

SHOW STATUS LIKE 'pattern';

For example, if you used the statement SHOW STATUS LIKE ‘KEY%’ you would only see the status for variables with key in the name.

Using GUIs: In addition to commands, you may also have a graphical interface with dashboards and reports for monitoring database status and information in real time.

  • For example, on a Microsoft SQL Server running on Windows Server, you could use Activity Monitor to get information about SQL Server processes and how the processes affect the

    current instance of SQL Server, and use

  • System Monitor to verify status and monitor SQL Server performance.

Logs


There are many possible log files you can use to help identify when and where an error occurred. The most used ones are:

  • Server and operating system logs, which log general server activity, connectivity, and other aspects of the server or servers running the database, and
  • Database error logs, which log information and errors specific to the database being operated, such as a MYSQL or PostgreSQL system.

Log files are important tools for discovering when an error occurred and the description of the error.

SQL Server Logs

  • For example, in a typical SQL server, some of the most accessed logs are:
    • The Error Log file, which is created every time SQL server is started
    • The Event Log file, which shows informational and error events, and
    • the Default Trace Log, which is an optional log file that tracks all database configuration changes.

Whether you use log files or receive an error message, you will most likely need to interpret specific error messages and error codes. For example, error messages or logged errors may contain:

  • A Message and ID number that can be used to troubleshoot the problem.
  • They may also contain other information, such as:
    • The name of the procedure that caused the problem, the state of the database when the error occurred, and
    • additional descriptive information, the severity of the problem, or both.
  • You might also see a line number referenced if the problem occurred in a script or batch file.

After checking database status and compiling information, your next step is to learn more about the error you are experiencing. There are many documentation and help sites available on the Internet with error code tables that can help you decode and correct errors. Popular resources include:

Performance Monitoring


Performance Logs

Performance monitoring, reports, and server and database logs help identify performance bottlenecks and determine the best way to correct them.

  • Performance monitoring helps identify potential network, server, and database issues before they occur and helps determine where improvements can be made.
  • Dashboards can monitor databases in real time and provide an early warning system for problems before they affect users, in addition to tracking historical performance and other metrics.
  • Server and database logs help identify a problem and when it occurred.

Diagnostic Logs

A diagnostic log tracks what is happening in a system component or an application such as a web server or a database. It contains information about an event, or a problem encountered when processing requests.

The diagnostic log is a record of significant events and errors in chronological order and is very useful for diagnosing or troubleshooting problems.

  • For example, if a web server receives a request to load a file that is missing, this error and its details can be logged to the webserver log.

The diagnostic logs are also sometimes referred to as troubleshooting logs, error logs, or event logs. It is essential to recognize that they may contain events, informational messages, warnings, or errors and their details. An event might be when a user connects to the database, or an informational message that includes information on a particular subsystem’s status. Additional events may be a warning message that might indicate that disk space for the database volume is running low, or an error message may that contains details about a failed database backup operation.

Some logs may have fewer or more categories of diagnostic messages such as FATAL and PANIC as additional classes of errors or may label them differently, for example, NOTE or NOTICE instead of Information.

A database administrator needs to be familiar with various types of diagnostic logs in the system. These include logs for server components, storage and other hardware devices, network, operating system, applications, and databases. While the systems administrator may monitor the other logs, the DBA needs to be intimately familiar with the database diagnostic logs, and review them frequently.

These database diagnostic logs are generally separate from the database transaction logs or query logs. In most cases, you can configure the location of the log files. However, certain logs, such as system log files, may have a fixed location. Many log files are recorded in plain text, and any text editor will do just fine to open them.

  • For example, on a machine with a Windows operating system, double-clicking on a LOG file will open Notepad by default.
  • However, some log files are machine-readable and may require particular viewers to see their contents.

Diagnostic logs can contain a variety of information. Some typical components of messages in a diagnostic log include:

  • The type and severity of issue – whether it is an event, informational, warning, or an error,
  • The corresponding error code and message,
  • The location, URL, or subsystem where the error occurred,
  • A timestamp,
  • The user’s IP and user agent,
  • And finally, any additional information depending on the event or error.

Db2 Log

Here is an example of a message in the Db2 diagnostic log file called db2diag.log, which is in ASCII or plain text format and which you can view using a text editor.

It includes a

  • timestamp of when the message was logged,
  • the type of message – in this case,it is an event related to starting the database manager (initiated by running db2start.exe),
  • it also includes additional details such as the Db2 software,
  • fixpack  and operating system levels,
  • and information about system resources (CPUs and memory).

Using the Db2 database manager configuration, you can configure the location of the db2 diagnostic log and the severity or diagnostic level of messages to log.

PostgreSQL Logs

Similarly, the PostgreSQL database typically logs all events like startup and shutdown, errors, connection issues, and so on. You can configure the location where the log entries are stored using the log_destination parameter in the postgresql.conf:

  • the syslog in case of Linux and Unix systems,
  • the event log in case of Windows systems,
  • to a CSVLOG file that you can easily import into a table and query the log entries using standard SQL statements, or
  • to STDERR (short for standard error), which is the default and typically outputs to the console, but can be configured using log_directory parameter to specify a particular location – the default being pg_log. You can also select the format of the log_filename that names the log files based on a specific string pattern that can include a timestamp.

MySQL Logs

In MySQL, distinct from the binary or the transaction logs that are used for recoverability, you can use other server logs for troubleshooting problems. These include:

  • The General Query Log, which logs connections and queries received from clients,
  • The Slow Query Log, which as the name implies,
  • logs queries that take longer than a specified time to execute.
  • And most importantly, the Error Log that contains diagnostic and error messages from MySQL D – the MySQL daemon or server process.

In addition to the error messages, the MySQL error log also includes warnings and notes during database server operation, startup, and shutdown.

You can configure MySQL to write messages to:

  • the Windows event log – which is the default setting,
  • the syslog or system log on Linux and Unix like systems,
  • the stderr or standard error – which is the default on Linux and Unix, or to a specified file.
  • You specify the filename for logging messages using the log_error option in MySQL D or the MySQL configuration file.

DBAs can configure how much information is stored in the error log, choosing between level 1 – errors only, level 2 – errors and warnings, and level 3 – errors, warnings, and notes. Level 3 is the default. The verbosity of messages is specified using the log_err_verbosity option.

Automating Tasks


Database automation: Uses unattended processes and self-updating procedures for administrative tasks leveraging existing processes and tools to make administration tasks simpler and quicker. Results in fewer deployment errors and higher reliability and speed on change implementations. Frees up staff otherwise occupied updating code and performing other work. And Works best with tasks that are time-consuming yet repetitive, such as database health check, alerts, and server/database maintenance.

Script automation is the process of using software to leverage and re-use existing scripts to deliver automation in a managed framework without requiring custom scripts to be developed and maintained each time.

Scripts are written to carry out routine, yet important, jobs which include:

  • Backing up and clearing event logs
  • Automating networking tasks
  • Monitoring system performance
  • Reading and writing to the registry
  • Managing various user accounts, computer accounts, printers, and applications and services.

There are several methods and tools for automating most database administration tasks. Some of these come built into a database, and others are performed by DBAs and developers either via scripts or database code. Common scrips that DBAs can write include processes to:

  • backup,
  • truncate,
  • load,
  • and restore databases.

When writing, updating, and maintaining scripts, it is important to have a version control system that keeps track of and retains an incremental history of code and database changes.

If necessary, the DBA can restore the database to a previous version if an error or other issue occurs. Database scripts work the best when developing within a version control environment.

The primary goal is to keep the code in sync with the database. You can also write scripts to perform certain database administrator (or DBA) tasks, such as:

  • sending error notifications,
  • moving archive logs from one storage to another storage container,
  • scheduling reports, and so on.

DBAs can also create scripts that are useful for the deployment of code changes from database to database. There are several ways for automating most database administration tasks;

  • these include tools or schedulers, such as cron jobs, and scripts using shell, Python or other scripting languages.

There are many good reasons why you would want to automate database tasks that include:

  • to increase throughput or productivity,
  • to improve quality or increase the predictability of quality,
  • to improve the robustness (or consistency), of processes or product,
  • to increase consistency of output or results,
  • to free up workforce to take on other roles,
  • and to provide higher-level jobs in the development, deployment, maintenance and running of the automated processes.

Tasks to Automate

  • Database Health Check – the process of inspecting a database to see how healthy and efficient the system is.

  • Alert Log File Cleanup – the process of deleting the chronological log of messages and errors written out by the database.

    • Typical messages found in this file include database startup, shutdown, log switches, space errors, and so on.
  • Trace File Cleanup – the process that deletes the trace file, or backup file, which is a snapshot that shows the process that was executing and modules that were loaded for an app at a point in time.

  • And Data Dictionary Statistics – the process where the system gathers a collection of names, definitions, and attributes about data elements that are being used or captured in a database, information system, or part of a research project. A Data Dictionary provides metadata about data elements.

  • Database Configuration Check – the process to check to see whether your database configuration still complies with the current recommendations for your system.

  • Schema Object check – the process of monitoring your database changes to quickly identify the weak links and problematic queries.

  • And Routine Daily Tasks using GUI Tools – the process of performing everyday functions in the database with a GUI, for example running reports and backing up files.

Database Testing

  • Database testing involves checking the database to ensure that everything is correct and running properly using a controlled testing environment.
  • Database testing involves checking the schema, tables, triggers, and so on.
  • Database testing is important in software testing because it ensures data values and information received and stored into the database are valid.
  • Helps to prevent data loss, saves aborted transaction data, prohibits unauthorized access to the information,
  • and checks the integrity and consistency of data.

Automated Software Testing

Automated software testing can look inside an application and see memory contents, data tables, file contents, and internal program states to determine if the product is behaving as expected. Once created, automated tests can be run repeatedly at no additional cost and are much faster and more secure than manual tests.

Automated tests can increase the depth and scope of tests to help improve quality and security of automation database testing.

Automating Reports & Alerts


Imagine you’re a DBA working for a small company, and each week you need to give your bosses the status of the databases for which you are responsible. How will you get that information, and what should you include?

RDBMSs include reporting functionality that gives you insights into the health of your databases, like the number of users connecting successfully or failing to connect, the amount of space used and the rate of increase, and the number of queries executed against the database.

You can create and configure reports with specific metrics to give you the information you need. Running reports on database health allows you to address issues before they become serious problems while allowing you to keep track of trends over time and help you to predict future needs and prepare for them.

You can automate reports to run daily, weekly, or monthly, depending on your needs. In larger companies, an entire department may manage automated reports, while in smaller companies, a few DBAs, or in some cases only one DBA, might perform that task. Some companies use third-party reporting tools that provide extra options and features. Whereas reports give a regular overview of database health, notifications are used when an issue should be tracked but doesn’t require your immediate attention.

  • Notifications give you, the DBA, an opportunity to track specific database events. You might choose to be notified when a user attempts to log in but fails. A few events like this are part of daily life, as users forget or mistype their passwords, but a cluster of login failure notifications may indicate a malicious attempt to gain access to data.

  • You can receive automated notifications through SMS messages, email, or via a dashboard. You need to configure your preferred option. Alerts quickly make you aware of issues that require your urgent attention. They are triggered by events like catastrophically low drive space or memory, scheduled jobs that have failed to complete, or error-level events in the error log.

  • You need to determine which alerts are appropriate for your environment and configure them to immediately reach the DBA on duty.

  • When configuring alerts and notifications, be careful not to set so many that you cannot respond to them all.

  • You should evaluate which ones are vital and configure them accordingly.

  • Alerts use two or more thresholds to help communicate the severity of an event.

    • A warning threshold, when the system sends a warning message, and
    • a critical threshold, when the system sends a critical alert message.
  • A best practice is to send out warning alerts when the threshold for the specified event reaches 85% and a critical alert when the threshold reaches 95%.

  • Most RDBMSs enable you to configure the content and frequency of reporting through a graphical interface. Many provide sample reports and enable you to configure your own if necessary.

  • Notifications and alerts function similarly, and you can configure them through a graphical interface, a command-line tool, or a script.

  • The process of automating reports, notifications, and alerts varies depending on which RDBMS you are using.