//cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/example-guided-project/flights_RUSSIA_small.sql
wget https:
# for local server go to directory
/mnt/d/data$ wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/example-guided-project/flights_RUSSIA_small.sql
--2024-10-03 08:03:31-- https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/example-guided-project/flights_RUSSIA_small.sql
-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 169.63.118.104
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|169.63.118.104|:443... connected.
Connecting to cf200 OK
HTTP request sent, awaiting response... 103865229 (99M) [application/x-sql]
Length:
Saving to: ‘flights_RUSSIA_small.sql’
100%[=================================================>] 99.05M 3.72MB/s in 49s
flights_RUSSIA_small.sql
2024-10-03 08:04:21 (2.01 MB/s) - ‘flights_RUSSIA_small.sql’ saved [103865229/103865229]
PostgreSQL Monitoring
Setup Db
Data
In this lab, you will use a database from https://postgrespro.com/education/demodb distributed under the PostgreSQL licence. It stores a month of data about airline flights in Russia and is organized according to the following schema:
Download Data
Since we’ll be using a cloud service some of the initial steps will vary.
- Start Server
- New terminal
- Download/import data
Postgre CLI
--username=postgres --host=localhost
$ psql 15.2 (Ubuntu 15.2-1.pgdg18.04+1), server 13.2)
psql ("help" for help.
Type
=# postgres
Create db from Script
=# \i flights_RUSSIA_small.sql postgres
List Tables
=# \dt
demo
List of relations| Name | Type | Owner
Schema ----------+-----------------+-------+----------
| aircrafts_data | table | postgres
bookings | airports_data | table | postgres
bookings | boarding_passes | table | postgres
bookings | bookings | table | postgres
bookings | flights | table | postgres
bookings | seats | table | postgres
bookings | ticket_flights | table | postgres
bookings | tickets | table | postgres
bookings 8 rows) (
Monitor w CLI
Database monitoring refers to reviewing the operational status of your database and maintaining its health and performance. With proper and proactive monitoring, databases will be able to maintain a consistent performance. Any problems that emerge, such as sudden outages, can be identified and resolved in a timely manner.
Tools such as pgAdmin, an open source graphical user interface (GUI) tool for PostgreSQL, come with several features that can help monitor your database. The main focus in this lab will be using the command line interface to monitor your database, but we’ll also take a quick look at how the monitoring process can be replicated in pgAdmin.
Monitoring these statistics can be helpful in understanding your server and its databases, detecting any anomalies and problems that may arise.
Server Acivity
This query will retrieve the following:
Column | Description |
---|---|
pid | Process ID |
usename | Name of user logged in |
datname | Name of database |
state | Current state, with two common values being: active (executing a query) and idle (waiting for new command) |
state_change | Time when the state was last changed |
This information comes from the pg_stat_activity, one of the built-in statistics provided by PostgreSQL.
- As you can see, there are currently 6 active connections to the server, with two of them being connected to databases that you’re familiar with. After all, you started in the default postgres database, which is now idle, and now you’re actively querying in the demo database.
- To see what other columns are available for viewing, feel free to take a look at the pg_stat_activity documentation!
=# SELECT pid, usename, datname, state, state_change FROM pg_stat_activity;
demo
# OUTPUT
| usename | datname | state | state_change
pid ------+----------+---------+--------+-------------------------------
43 | | | |
45 | postgres | | |
1050 | postgres | demo | active | 2012-10-03 13:14:20.153269+00
41 | | | |
40 | | | |
42 | | | |
6 rows) (
Monitor Last Query
Let’s say you wanted to see all the aforementioned columns, in addition to the actual text of the query that was last executed. Which column should you add to review that?
- If you wanted to see which query was most recently executed, you can add the query column.
- This column returns the most recent query.
- If state is active, it’ll show the currently executed query. If not, it’ll show the last query that was executed.
=# SELECT pid, usename, datname, state, state_change, query FROM pg_stat_activity;
demo
# OUTPUT
| usename | datname | state | state_change | query
pid ------+----------+---------+--------+-------------------------------+---------------------------------------------------------------------------------
43 | | | | |
45 | postgres | | | |
1050 | postgres | demo | active | 2024-10-03 13:18:45.030559+00 | SELECT pid, usename, datname, state, state_change, query FROM pg_stat_activity;
41 | | | | |
40 | | | | |
42 | | | | |
6 rows) (
Active Processes
To see which processes are active, you use the following query
=# SELECT pid, usename, datname, state, state_change, query FROM pg_stat_activity WHERE state = 'active';
demo
# OUTPUT
| usename | datname | state | state_change | query
pid ------+----------+---------+--------+-------------------------------+--------------------------------------------------------------------------------------------------------
1050 | postgres | demo | active | 2024-10-03 13:22:19.217489+00 | SELECT pid, usename, datname, state, state_change, query FROM pg_stat_activity WHERE state = 'active';
1 row) (
Database Activity
To look at db activity use the query below to extract:
Column | Description |
---|---|
datname | Name of database |
tup_inserted | Number of rows inserted by queries in this database |
tup_updated | Number of rows updated by queries in this database |
tup_deleted | Number of rows deleted by queries in this database |
This information comes from the pg_stat_database, one of the statistics provided by PostgreSQL.
=# SELECT datname, tup_inserted, tup_updated, tup_deleted FROM pg_stat_database;
demo
# OUTPUT
| tup_inserted | tup_updated | tup_deleted
datname -----------+--------------+-------------+-------------
| 2 | 1 | 0
| 0 | 0 | 0
postgres | 2290162 | 22 | 0
demo | 0 | 0 | 0
template1 | 0 | 0 | 0
template0 5 rows) (
- As you can see, the two databases that are returned are the postgres and demo. These are databases that you are familiar with.
- The other two, template1 and template0 are default templates for databases, and can be overlooked in this analysis.
- Based on this output, you now know that demo had about 2,290,162 rows inserted and 22 rows updated.
- To see what other columns are available for viewing, you can read through the pg_stat_database documentation.
Monitor Rows Fetched
Let’s say you wanted to see the number or rows fetched and returned by this database. Note:
- The number of rows fetched is the number of rows that were returned.
- The number of rows returned is the number of rows that were read and scanned by the query.
- Notice how the rows returned tend to be greater than the rows fetched.
- If you consider how tables are read, this makes sense because not all the rows scanned may be the ones that are returned.
=# SELECT datname, tup_fetched, tup_returned FROM pg_stat_database;
demo
# OUTPUT
| tup_fetched | tup_returned
datname -----------+-------------+--------------
| 2113 | 2354ate, sta
| 7086 | 43523
postgres | 586118 | 8359082ate, sta
demo | 0 | 0
template1 | 0 | 0
template0 5 rows) (
One db Query
What if you only wanted to see the database details (rows inserted, updated, deleted, returned and fetched) for demo
=# SELECT datname, tup_inserted, tup_updated, tup_deleted, tup_fetched, tup_returned FROM pg_stat_database WHERE datname = 'demo';
demo
# OUTPUT
| tup_inserted | tup_updated | tup_deleted | tup_fetched | tup_returned
datname ---------+--------------+-------------+-------------+-------------+--------------
| 2290162 | 22 | 0 | 586188 | 8364313
demo 1 row) (
Performance over Time
Extensions, which can enhance your PostgreSQL experience, can be helpful in monitoring your database. One such extension is pg_stat_statements, which gives you an aggregated view of query statistics.
- To enable the extension, enter the following command
- This will enable the pg_stat_statements extension, which will start to track the statistics for your database.
=# CREATE EXTENSION pg_stat_statements;
demo CREATE EXTENSION
Edit Config File
- Now, let’s edit the PostgreSQL configuration file to include the extension you just added
- For the changes to take effect, you will have to restart your database.
=# ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements'; demo
- You can do that by typing
exit
in the terminal to stop your current session - Close the terminal and return to the PostgreSQL tab
- Select Stop
- Once the session has become inactive, select Start
Connect to db
- Now that you’ve restarted the server after altering the config file
- Connect to db
=# \connect demo
postgres
# RESPONSE
15.2 (Ubuntu 15.2-1.pgdg18.04+1), server 13.2)
psql ("demo" as user "postgres". You are now connected to database
Verify Extension
Let’s verify the extension was added, and you can see the pg_stat_statements has been installed
=# \dx
demo
# OUTPUT
List of installed extensions| Version | Schema | Description
Name --------------------+---------+------------+------------------------------------------------------------------------
| 1.8 | bookings | track planning and execution statistics of all SQL statements executed
pg_stat_statements | 1.0 | pg_catalog | PL/pgSQL procedural language
plpgsql 2 rows) (
Check the shared_preload_libraries
=# show shared_preload_libraries;
demo
shared_preload_libraries --------------------------
pg_stat_statements1 row) (
Turn on Expanded Table Formatting
Since the results returned by pg_stat_statements can be quite long, let’s turn on expanded table formatting with the following command. This will display the output in an expandable table format.
You can turn it off by running the same command again
=# \x
demois on. Expanded display
Db ID
- From the pg_stat_statements documentation, you’ll see the various columns available to be retrieved.
- Let’s say we wanted to retrieve the db ID
- the query, and
- total time that it took to execute the statement (in milliseconds)
- Now you can scroll through the results
- Unlike pg_stat_activity, which showed the latest query that was run, pg_stat_statements shows an aggregated view of the queries that were run since the extension was installed.
=# SELECT dbid, query, total_exec_time FROM pg_stat_statements;
demo
-[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 16384
dbid | SELECT e.extname AS "Name", e.extversion AS "Version", n.nspname AS "Schema", c.description AS "Description" +
query | FROM pg_catalog.pg_extension e LEFT JOIN pg_catalog.pg_namespace n ON n.oid = e.extnamespace LEFT JOIN pg_catalog.pg_description c ON c.objoid = e.oid AND c.classoid = $1::pg_catalog.regclass+
| ORDER BY 1
| 0.197333
total_exec_time -[ RECORD 2 ]---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 16384
dbid | show shared_preload_libraries
query | 0.01642 total_exec_time
Match ID to Db
- What if you wanted to find out which db matches the ID
- Based on this, you can now see that database ID 16384 is the demo database.
- This makes sense because you performed the query
show shared_preload libraries
on the demo database, which appeared in pg_stat_statements.
=# SELECT oid, datname FROM pg_database;
demo-[ RECORD 1 ]------
| 13442
oid | postgres
datname -[ RECORD 2 ]------
| 16384
oid | demo
datname -[ RECORD 3 ]------
| 1
oid | template1
datname -[ RECORD 4 ]------
| 13441
oid | template0 datname
Drop Extension
It’s important to note that adding these extensions can increase your server load, which may affect performance. If you need to drop the extension, you can achieve that with the following command
=# DROP EXTENSION pg_stat_statements;
demo DROP EXTENSION
Check Extensions List
Let’s check the list to make sure it has been deleted
=# \dx
demo
List of installed extensions-[ RECORD 1 ]-----------------------------
| plpgsql
Name | 1.0
Version | pg_catalog
Schema | PL/pgSQL procedural language Description
Reset Config File
=# ALTER SYSTEM RESET shared_preload_libraries;
demo ALTER SYSTEM
List Extensions
Just as we did earlier, after altering the config file, we need to
- exit
- Stop server
- Restart
- Open Postgre CLI
- Connect to db
=# \connect demo
postgres
=# \dx
demo# OUTPUT
List of installed extensions| Version | Schema | Description
Name ---------+---------+------------+------------------------------
| 1.0 | pg_catalog | PL/pgSQL procedural language
plpgsql 1 row)
(
=# show shared_preload_libraries;
demo
# OUTPUT
shared_preload_libraries --------------------------
1 row) (
Monitor w pgAdmin
- Launch it
- Left menu > Servers > Password
- Home page = Dashboard
The table below lists the displayed statistics on the Dashboard that correspond with the statistics that we accessed with the CLI above
Chart | Description |
---|---|
Server/Database sessions | Displays the total sessions that are running. For servers, this is similar to the pg_stat_activity, and for databases, this is similar to the pg_stat_database. |
Transactions per second | Displays the commits, rollbacks, and transactions taking place. |
Tuples in | Displays the number of tuples (rows) that have been inserted, updated, and deleted, similar to the tup_inserted, tup_updated, and tup_deleted columns from pg_stat_database. |
Tuples out | Displays the number of tuples (rows) that have been fetched (returned as output) or returned (read or scanned). This is similar to tup_fetched and tup_returned from pg_stat_database. |
Server activity | Displays the sessions, locks, prepared transactions, and configuration for the server. In the Sessions tab, it offers a look at the breakdown of the sessions that are currently active on the server, similar to the view provided by pg_stat_activity. To check for any new processes, you can select the refresh button at the top-right corner. |
Reset Charts
Stop server - All the charts above quiet down to zero
Wait till it’s inactive and
Click on Postgre CLI to start another instance
It might ask you for password again
You may have noticed that the Server sessions saw an increase of sessions. It increased from 7 to 8 sessions. This makes sense since you started a new session with PostgreSQL CLI.
To see that change reflected in Server Activity, you’ll have to click the refresh button to see that an additional postgres database session appeared.
Connect to db
- Let’s connect to demo
- Left menu of pgAdmin
- Under postgres>Databases
- Click on demo to connect to the db
- Notice the number of tuples out
Run Query on db
- Let’s run a query on the database! To do that
- Navigate to the menu bar and select
Tools > Query Tool
. or choose it from the tabs while in the demo database - You can run any query. To keep things simple, let’s run the following to select all the data from the bookings table
- You’ll see the results below the query screen
* FROM bookings; SELECT
Db Activity
- Switch back to the db dashboard
- You can refresh the Server activity and check to see if any of the charts have shown a spike since the data was retrieved.
- You may have noticed that the number of tuples (rows) returned (read/scanned) was greater than 250,000.
Explain
- Check the number of rows scanned using a query
- f you can’t see the full text, simply drag the QUERY PLAN column out.
- This statement reveals that 263,887 rows were scanned, which is similar to the amount that was read/scanned based on the spike in the Tuples out chart.
* FROM bookings; EXPLAIN SELECT
While you can monitor your database through the command line alone, tools like pgAdmin can be helpful in providing a visual representation of how your server and its databases are performing.
Optimize db
Data optimization is the maximization of the speed and efficiency of retrieving data from your database. Optimizing your database will improve its performance, whether that’s inserting or retrieving data from your database. Doing this will improve the experience of anyone interacting with the database.
Similar to MySQL, there are optimal data types and maintenance (otherwise known as “vacuuming”) that can be applied to optimize databases.
Optimize Data Types
When it comes to optimizing data types, understanding the data values will help in selecting the proper data type for the column.
Let’s take a look at an example in the demo database.
Return to the CLI session that you opened previously (or open a new session if it has been closed).
If you’re no longer conected to the demo database, you can reconnect to it!
=# \connect demo
postgres15.2 (Ubuntu 15.2-1.pgdg18.04+1), server 13.2)
psql ("demo" as user "postgres".
You are now connected to database
=# \dt
demo
List of relations| Name | Type | Owner
Schema ----------+-----------------+-------+----------
| aircrafts_data | table | postgres
bookings | airports_data | table | postgres
bookings | boarding_passes | table | postgres
bookings | bookings | table | postgres
bookings | flights | table | postgres
bookings | seats | table | postgres
bookings | ticket_flights | table | postgres
bookings | tickets | table | postgres
bookings 8 rows) (
Let’s look at the data from the first table
=# SELECT * FROM aircrafts_data;
demo| model | range
aircraft_code ---------------+-------------------------------+-------
773 | {"en": "Boeing 777-300"} | 11100
763 | {"en": "Boeing 767-300"} | 7900
| {"en": "Sukhoi Superjet-100"} | 3000
SU9 320 | {"en": "Airbus A320-200"} | 5700
321 | {"en": "Airbus A321-200"} | 5600
319 | {"en": "Airbus A319-100"} | 6700
733 | {"en": "Boeing 737-300"} | 4200
| {"en": "Cessna 208 Caravan"} | 1200
CN1 | {"en": "Bombardier CRJ-200"} | 2700
CR2 9 rows) (
You can see that there are 9 entries in total with three columns: aircraft_code, model, and range.
For our purposes, we’ll create a hypothetical situation that will potentially require changing the data types of columns to optimize them:
Let’s say that aircraft_code is always set to three characters
model will always be in a JSON format
range has a maximum value of 12,000 and minimum value of 1,000
Let’s change the types to:
aircraft_code: char(3), since you know that the aircraft code will always be fixed to three characters.
model: json, which is a special data type that PostgreSQL supports.
range: smallint, since the range of its numbers falls between -32,768 to 32,767.
Check Data Types
You can check the current data types (and additional details such as the indexes and constraints) of the aircrafts_data table with the following:
=# \d aircrafts_data
demo
# OUTPUT
"bookings.aircrafts_data"
Table | Type | Collation | Nullable | Default
Column ---------------+--------------+-----------+----------+---------
| character(3) | | not null |
aircraft_code | jsonb | | not null |
model range | integer | | not null |
Indexes:"aircrafts_pkey" PRIMARY KEY, btree (aircraft_code)
Check constraints:"aircrafts_range_check" CHECK (range > 0)
Referenced by:"flights" CONSTRAINT "flights_aircraft_code_fkey" FOREIGN KEY (aircraft_code) REFERENCES aircrafts_data(aircraft_code)
TABLE "seats" CONSTRAINT "seats_aircraft_code_fkey" FOREIGN KEY (aircraft_code) REFERENCES aircrafts_data(aircraft_code) ON DELETE CASCADE TABLE
- Notice that most of the columns in this table have been optimized for our sample scenario, except for the range.
- This may be because the range was unknown in the original database.
- Let’s take the opportunity to optimize that column for your hypothetical situation.
- You can do this by changing the data type of the column.
You’ll first need to drop a view, which is another way our data can be presented, in order to change the column’s data type. Otherwise, you will encounter an error. This is a special case for this database because you loaded a SQL file that included commands to create views. In your own database, you may not need to drop a view.
Drop View
=# DROP VIEW aircrafts;
demo DROP VIEW
Change Type
=# ALTER TABLE aircrafts_data ALTER COLUMN range TYPE smallint;
demo ALTER TABLE
Review
=# \d aircrafts_data
demo"bookings.aircrafts_data"
Table | Type | Collation | Nullable | Default
Column ---------------+--------------+-----------+----------+---------
| character(3) | | not null |
aircraft_code | jsonb | | not null |
model range | smallint | | not null |
Indexes:"aircrafts_pkey" PRIMARY KEY, btree (aircraft_code)
Check constraints:"aircrafts_range_check" CHECK (range > 0)
Referenced by:"flights" CONSTRAINT "flights_aircraft_code_fkey" FOREIGN KEY (aircraft_code) REFERENCES aircrafts_data(aircraft_code)
TABLE "seats" CONSTRAINT "seats_aircraft_code_fkey" FOREIGN KEY (aircraft_code) REFERENCES aircrafts_data(aircraft_code) ON DELETE CASCADE TABLE
Vacuum Database
In your day-to-day life, you can vacuum our rooms to keep them neat and tidy. You can do the same with databases by maintaining and optimizing them with some vacuuming.
- In PostgreSQL, vacuuming means to clean out your databases by reclaiming any storage from “dead tuples”, otherwise known as rows that have been deleted but have not been cleaned out.
- Generally, the autovacuum feature is automatically enabled, meaning that PostgreSQL will automate the vacuum maintenance process for you.
- You can check if this is enabled with the following command:
=# show autovacuum;
demo
autovacuum ------------
on1 row) (
- Since autovacuum is enabled, let’s check to see when your database was last vacuumed.
- To do that, you can use the pg_stat_user_tables, which displays statistics about each table that is a user table (instead of a system table) in the database.
- The columns that are returned are the same ones listed in pg_stat_all_tables documentation.
- What if you wanted to check the table (by name), the estimated number of dead rows that it has, the last time it was autovacuumed, and how many times it has been autovacuumed?
- To select the table name, number of dead rows, the last time it was autovacuumed, and the number of times this table has been autovacuumed, you can use the following query:
=# SELECT relname, n_dead_tup, last_autoanalyze, autovacuum_count FROM pg_stat_user_tables;
demo
| n_dead_tup | last_autoanalyze | autovacuum_count
relname -----------------+------------+-------------------------------+------------------
| 0 | 2024-10-03 13:09:33.469043+00 | 2
seats | 657 | 2024-10-03 13:09:38.983979+00 | 1
tickets | 9 | | 0
aircrafts_data | 642 | 2024-10-03 13:09:33.453971+00 | 1
flights | 1000 | 2024-10-03 13:09:32.98207+00 | 1
bookings | 1000 | 2024-10-03 13:10:33.744129+00 | 1
ticket_flights | 1000 | 2024-10-03 13:10:34.950993+00 | 1
boarding_passes | 0 | 2024-10-03 13:09:32.581493+00 | airports_data
Notice that you currently don’t have any “dead tuples” (deleted rows that haven’t yet been cleaned out) and so far, these tables have been autovacuumed once. This makes sense given that the database was just created and based on the logs, autovacuumed then.