Connect to DBs

R & SQLite


This section will cover how to connect to DBs from within RStudio with a combination of R & SQL

Using .sql file

World_Economies

conn=DBI::dbConnect(RSQLite::SQLite(),'D:/data/GDP/World_Economies.db')
  • After you setup the connection you have to reference it and instruct RStudio to use SQL with this
  • {sql connection = conn}
  • as the first line in the SQL code chunk

Preview

SELECT  *
FROM    Countries_by_GDP
Displaying records 1 - 10
Country GDP_USD_billions
United States 26854.60
China 19373.59
Japan 4409.74
Germany 4308.85
India 3736.88
United Kingdom 3158.94
France 2923.49
Italy 2169.74
Canada 2089.67
Brazil 2081.24

List Tables

SELECT name FROM sqlite_master WHERE type='table';
1 records
name
Countries_by_GDP

Table Attributes

PRAGMA table_info([table_name])

INSTRUCTOR

When updating a db make sure you commit() as some require it. In this example the data can only be accessed from the connection that created and edited the table. When I tried to access it from other connections it would not display any information. I went back to the original connection and conn.commit(). and it worked fine after that.

conn2=DBI::dbConnect(RSQLite::SQLite(),'D:/data/sqlite3/INSTRUCTOR.db')

List Tables

SELECT name FROM sqlite_master WHERE type='table';
4 records
name
Sc_table
Sc_table2
Instructor
Scores

Preview

SELECT   *
FROM     Sc_table
LIMIT    10
0 records
country first_name last_name test_score

STAFF db

conn4=DBI::dbConnect(RSQLite::SQLite(),'D:/data/STAFF.db')

List all Tables

SELECT name FROM sqlite_master WHERE type='table';
1 records
name
INSTRUCTOR

Preview

SELECT *
FROM   INSTRUCTOR
LIMIT  10
Displaying records 1 - 10
ID FNAME LNAME CITY CCODE
1 Rav Ahuja TORONTO CA
2 Raul Chong Markham CA
3 Hima Vasudevan Chicago US
4 John Thomas Illinois US
5 Alice James Illinois US
6 Steve Wells Illinois US
7 Santosh Kumar Illinois US
8 Ahmed Hussain Illinois US
9 Nancy Allen Illinois US
10 Mary Thomas Illinois US

R & SQLite


Using .r file

library(RSQLite)

# create connection
conn5 <- dbConnect(RSQLite::SQLite(),'D:/data/sqlite3/INSTRUCTOR.db' )

# List Tables
dbListTables(conn5)
[1] "Instructor" "Sc_table"   "Sc_table2"  "Scores"    
preview <- dbGetQuery(conn5, "SELECT * FROM Instructor")
preview
  ID FNAME     LNAME      CITY CCODE
1  1   Rav     Ahuja MOOSETOWN    CA
2  2  Raul     Chong   Markham    CA
3  3  Hima Vasudevan   Chicago    US

Python & SQLite


Refer to Data/SQLite3 page for more details. Here is a summary table: # Python & SQL Summary


SQLite

Topic Syntax Description
connect() sqlite3.connect() Create a new database and open a database connection to allow sqlite3 to work with it. Call sqlite3.connect() to create a connection to the database INSTRUCTOR.db in the current working directory, implicitly creating it if it does not exist.
cursor() con.cursor() To execute SQL statements and fetch results from SQL queries, use a database cursor. Call con.cursor() to create the Cursor.
execute() cursor_obj.execute() The execute method in Python’s SQLite library allows to perform SQL commands, including retrieving data from a table using a query like “Select * from table_name.” When you execute this command, the result is obtained as a collection of table data stored in an object, typically in the form of a list of lists.
fetchall() cursor_obj.fetchall() The fetchall() method in Python retrieves all the rows from the result set of a query and presents them as a list of tuples.
fetchmany() cursor_obj.fetchmany() The fetchmany() method retrieves the subsequent group of rows from the result set of a query rather than just a single row. To fetch a few rows from the table, use fetchmany(numberofrows) and mention how many rows you want to fetch.
read_sql_query() read_sql_query() read_sql_query() is a function provided by the Pandas library in Python, and it is not specific to MySQL. It is a generic function used for executing SQL queries on various database systems, including MySQL, and retrieving the results as a Pandas DataFrame.
shape dataframe.shape It provides a tuple indicating the shape of a DataFrame or Series, represented as (number of rows, number of columns).
close() con.close() con.close() is a method used to close the connection to a MySQL database. When called, it terminates the connection, releasing any associated resources and ensuring the connection is no longer active. This is important for managing database connections efficiently and preventing resource leaks in your MySQL database interactions.
CREATE TABLE CREATE TABLE table_name ( column1 datatype constraints, column2 datatype constraints, ... ); The CREATE TABLE statement is used to define and create a new table within a database. It specifies the table’s name, the structure of its columns (including data types and constraints), and any additional properties such as indexes. This statement essentially sets up the blueprint for organizing and storing data in a structured format within the database.
barplot() seaborn.barplot(x="x-axis_variable", y="y-axis_variable", data=data) seaborn.barplot() is a function in the Seaborn Python data visualization library used to create a bar plot, also known as a bar chart. It is particularly used to display the relationship between a categorical variable and a numeric variable by showing the average value for each category.
read_csv() df = pd.read_csv('file_path.csv') read_csv() is a function in Python’s Pandas library used for reading data from a Comma-Separated Values (CSV) file and loading it into a Pandas DataFrame. It’s a common method for working with tabular data stored in CSV format
to_sql() df.to_sql('table_name', index=False) df.to_sql() is a method in Pandas, a Python data manipulation library used to write the contents of a DataFrame to a SQL database. It allows to take data from a DataFrame and store it structurally within a SQL database table.
read_sql() df = pd.read_sql(sql_query, conn) read_sql() is a function provided by the Pandas library in Python for executing SQL queries and retrieving the results into a DataFrame from an SQL database. It’s a convenient way to integrate SQL database interactions into your data analysis workflows.

Db2

Topic Syntax Description Example
connect() conn = ibm_db.connect('DATABASE=dbname; HOST=hostname;PORT=port;UID=username; PWD=password;', '', '') ibm_db.connect() is a Python function provided by the ibm_db library, which is used for establishing a connection to an IBM Db2 or IBM Db2 Warehouse database. It’s commonly used in applications that need to interact with IBM Db2 databases from Python.
server_info() ibm_db.server_info() ibm_db.server_info(conn) is a Python function provided by the ibm_db library, which is used to retrieve information about the IBM Db2 server to which you are connected.
close() con.close() con.close() is a method used to close the connection to a db2 database. When called, it terminates the connection, releasing any associated resources and ensuring the connection is no longer active. This is important for managing database connections efficiently and preventing resource leaks in your db2 database interactions.
exec_immediate() sql_statement = "SQL statement goes here"stmt = ibm_db.exec_immediate(conn, sql_statement) ibm_db.exec_immediate() is a Python function provided by the ibm_db library, which is used to execute an SQL statement immediately without the need to prepare or bind it. It’s commonly used for executing SQL statements that don’t require input parameters or don’t need to be prepared in advance.

Python & DB2


Note: cannot use Jupyter Notebook from outside Db2 cloud with the instructions detailed below. To access IBM Db2 from outside need to load different packages from the ones specified below

In order to get the connection credentials:

  • Go to cloud.ibm.com/resources
  • Find Databases dropdown on the resources table
  • Choose Db2-xx
  • Service credentials form the left menu
  • The first time this page will be empty
  • Click > New Credentials
  • Next page will have Name: … - Leave it
  • Role: Manager - Leave it
  • Click > Add
  • Now you’ll see the credential listed
  • Check the box on the left and
  • Click on Down Arrow to view the credential details

So as you see above we’ve already created a Db2 instance and we’ve retrieved our credentials in Data/Connect Script page.

Import ibm_db Library

The ibm_db API provides a variety of useful Python functions for accessing and manipulating data in an IBM® data server database, including functions for connecting to a database, preparing and issuing SQL statements, fetching rows from result sets, calling stored procedures, committing and rolling back transactions, handling errors, and retrieving metadata.

library(reticulate)
#py_install("ibm_db")
import os
os.add_dll_directory(r'c:\users\emhrc\onedrive\docume~1\virtua~1\r-reti~1\lib\site-packages')

import ibm_db

Define Credentials

dsn_hostname = "6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud" 
dsn_uid = "nzx60439"        
dsn_pwd = "XqJtPBH1PGw3rnjF"     

dsn_driver = "{IBM DB2 ODBC DRIVER}"
dsn_database = "bludb"            
dsn_port = "30376"                 
dsn_protocol = "TCPIP"            
dsn_security = "SSL"

Create Connection

Ibm_db API uses the IBM Data Server Driver for ODBC and CLI APIs to connect to IBM DB2 and Informix.

  • Lets build the dsn connection string using the credentials entered above
#Create the dsn connection string
dsn = (
    "DRIVER={0};"
    "DATABASE={1};"
    "HOSTNAME={2};"
    "PORT={3};"
    "PROTOCOL={4};"
    "UID={5};"
    "PWD={6};"
    "SECURITY={7};").format(dsn_driver, dsn_database, dsn_hostname, dsn_port, dsn_protocol, dsn_uid, dsn_pwd,dsn_security)

#print the connection string to check correct values are specified
print(dsn)

Establish Connection

try:
    conn = ibm_db.connect(dsn, "", "")
    print ("Connected to database: ", dsn_database, "as user: ", dsn_uid, "on host: ", dsn_hostname)

except:
    print ("Unable to connect: ", ibm_db.conn_errormsg() )

Close Connection

ibm_db.close(conn)

# OUTPUT
TRUE

R Studio & SQLite3


Here is a way to connect to a local SQLite3 db using RStudio

  • File > New File > SQL Script
  • The new file will open this this line on top
-- !preview conn=DBI::dbConnect(RSQLite::SQLite(),'D:/data/sqlite3/INSTRUCTOR.db')
  • Obviously as you see above it setup a connection to the db already

List Tables in db

SELECT name
FROM   sqlite_master
WHERE  type='table'

Preview the Results

  • As you save the code, you’ll see in the lower pane (next to console, terminal tabs) a SQL Results tab
  • Open that tab and you’ll see the results of the SQL Query there, in this case the list of tables in the db
  • If you wanted to run it here in quarto make sure you supply the conn to the sql code chunk
  • Add #| connection = conn
# This is R code
conn=DBI::dbConnect(RSQLite::SQLite(),'D:/data/sqlite3/INSTRUCTOR.db')
/* To supply a connection access to the sql chunk put this as first line: #| connection = conn */
SELECT name
FROM   sqlite_master
WHERE  type='table'
4 records
name
Sc_table
Sc_table2
Instructor
Scores

Close Connection

  • Make sure to always close connection to db
DBI::dbDisconnect(conn)