PostgreSQL Advanced Settings
  • 28 Nov 2022
  • 5 Minutes to read
  • Dark
    Light
  • PDF

PostgreSQL Advanced Settings

  • Dark
    Light
  • PDF

Article Summary

Warning:

We do not recommend changing advanced settings unless you are an experienced Panoply user.

For users who have some experience working with their data in Panoply, there are a number of items that can be customized for this data source.

  1. Destination Schema: This is the name of the target schema to save the data. The default schema for data warehouses built on Google BigQuery is panoply. The default schema for data warehouses built on Amazon Redshift is public. This cannot be changed once a source has been collected.
  2. Destination - Panoply selects the default destination for the tables where data is stored.
    • The default destination is postgres_<table or view name> , where <table or view name> is a dynamic field. For example, for a table or view name customers, the default destination table is postgres_customers.
    • To prefix all table names with your own prefix, use this syntax: prefix_<table or view name> where prefix is your desired prefix name and <table or view name> is a variable representing the tables and views to be collected.
  3. Primary Key - The Primary Key is the field or combination of fields that Panoply will use as the deduplication key when collecting data. Panoply sets the primary key depending on the scenario identified in the following table. To learn more about primary keys in general, see Primary Keys.
Postgres id columnEnter a primary keyOutcome
yesnoPanoply will automatically select the id column and use it as the primary key.
yesyesNot recommended. Panoply will use the id column but will overwrite the original source values.
If you want Panoply to use your database table's id column, do not enter a value into the Primary Key field.
nonoPanoply creates an id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
noyesPanoply creates a hashed id column using the primary key values entered, while retaining the source columns. WARNING: Any user-entered primary key will be used across all the Postgres tables selected.

4. Logical Replication: By default, Panoply fetches all of your Postgres data on each run. If you want to utilize the Postgres logical replication, check this checkbox.
Once checked, you will need to select the publication name and replication slot name to use.
Collecting data using logical replication will enable Panoply to automatically identify new and updated records in your Postgres table and will extract only them.

Requirements:
  1. Server version 10 and higher
  2. Set wal_level to logical.
  3. Create a publication on specific tables or all tables.
    CREATE PUBLICATION panoply_publication FOR TABLE table1, table2 WITH (publish = 'insert, update'); or CREATE PUBLICATION panoply_publication FOR ALL TABLES WITH (publish = 'insert, update');
  4. Set the desired tables to have the replica identity set to FULL: ALTER TABLE table1 REPLICA IDENTITY FULL;
  5. Create a Panoply dedicated replication slot.
    SELECT pg_create_logical_replication_slot('panoply_replication_slot', 'pgoutput');
    In case you've reached the maximum number of available replication slots (show max_replication_slots;), you should increase the max_replication_slots: SET max_replication_slots = X;
  6. Make sure that the connected user in the Panoply data source has replication permissions: ALTER USER replication_user REPLICATION;
Warnings:
  • The connected user needs access to the following system tables: pg_publication, pg_publication_tables, pg_replication_slots, pg_class, information_schema.columns, and pg_type.
  • Although it's not mandatory, it is suggested to use a dedicated Panoply publication.
  • Although it's not mandatory, it is suggested to increase the max_wal_senders by 1. Use show max_wal_senders; to see the current value and set it using set max_wal_senders = X;.
  • The replication slots should be used by Panoply's data sources only.
  • Each replication slot should be used in a single data source only.
  • Collecting data using the locigcal replication will extract all changes made in the selected tables and with no primary key defined, Panoply will create duplicates in its tables.

5. Incremental Key - By default, Panoply fetches all of your data on each run. If you only want to collect some of your data, enter a column name to use as your incremental key. The column must be logically incremental. Panoply will keep track of the maximum value reached during the previous run and will start there on the next run.

  • Incremental Key configurations
    • If no Incremental Key is configured by the user, by default, Panoply collects all the Postgres data on each run for the tables or views selected.
    • If the Incremental Key is configured by column name, but not the column value, Panoply collects all data, and then automatically configures the column value at the end of a successful run.
    • If the Incremental Key is configured by column name and the column value (manually or automatically), then on the first collection, Panoply will use that value as the place to begin the collection.
      • The value is updated at the end of a successful collection to the last value collected.
      • In future collections, the new value is used as the starting value. So in future collections Panoply looks for data where the IK value is greater than where the collection ended.
  • When an Incremental Key is configured, Panoply will look for that key in each of the selected tables and views. If the table or view does not have the column indicated as the Incremental Key, it must be collected as a separate instance of the data source.
  • A table or view may have some records may have a ‘null’ value for the incremental key, or they may not capture the incremental key at all. In these situations Panoply omits these records instead of failing the entire data source.
Warning:

If you set an incremental key, you can only collect one table per instance of Postgres.

  1. Exclude: The Exclude option allows you to exclude certain data, such as names, addresses, or other personally identifiable information. Enter the column names of the data to exclude.
  2. Parse String: If the data to be collected contains JSON, include the JSON text attributes to be parsed.
  3. Truncate: Truncate deletes all the current data stored in the destination tables, but not the tables themselves. Afterwards Panoply will recollect all the available data for this data source.
  4. Click Save Changes and then Collect.
    • The data source appears grayed out while the collection runs.
    • You may add additional data sources while this collection runs.
    • You can monitor this collection from the Jobs page or the Data Sources page.
    • After a successful collection, navigate to the Tables page to review the data results.

Was this article helpful?