Jupyter on Redshift
- 08 Jul 2021
- 1 Minute to read
- Print
- DarkLight
- PDF
Jupyter on Redshift
- Updated on 08 Jul 2021
- 1 Minute to read
- Print
- DarkLight
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback
Panoply on Redshift
To connect to your Panoply data warehouse using Python and a Jupyter notebook:
Install SQLAlchemy and pandas in your Python environment:
pip install sqlalchemy pandas
or if conda is your package manager of choice:
conda install sqlalchemy pandas
Create a new Jupyter Notebook and import the packages you'll need:
import pandas as pd import sqlalchemy from sqlalchemy import create_engine
Enter the connection parameters for your Panoply data warehouse:
POSTGRES_ADDRESS = 'db.panoply.io' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY POSTGRES_PORT = '5439' POSTGRES_USERNAME = 'username' ## CHANGE THIS TO YOURPANOPLY/POSTGRES USERNAME POSTGRES_PASSWORD = '*****' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD POSTGRES_DBNAME = 'database' ## CHANGE THIS TO YOUR DATABASE NAME
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME, password=POSTGRES_PASSWORD, ipaddress=POSTGRES_ADDRESS, port=POSTGRES_PORT, dbname=POSTGRES_DBNAME))
Initiate the connection to your Panoply data warehouse:
cnx = create_engine(postgres_str)
Collect your data into a pandas dataframe:
df = pd.read_sql_query('''SELECT * FROM tablename''', cnx)
Note that the above example will import the entire table into memory, so take care to create an SQL query that will pull only the data you want to analyze.
Was this article helpful?