- 07 Jun 2021
- 2 Minutes to read
- Print
- DarkLight
- PDF
Primary Keys
- Updated on 07 Jun 2021
- 2 Minutes to read
- Print
- DarkLight
- PDF
Primary keys are used to ensure the uniqueness of the rows in a table. The Primary Key is comprised of one or more columns that can be used to identify the row in the data warehouse. The process guarantees that no two rows have the same value in their primary key.
When new data is inserted into a table, it is first checked against the existing data to determine if a row with the same value in the primary key already exists. If it does, the new row is used to update the existing row, rather than inserting a new one. However, unlike many data warehouses, Panoply doesn't do that by rejecting attempts to insert the new conflicting row with a unique error. Instead, it uses the new row to update some, or all, of the attributes (columns) in the existing row.
Default Primary Keys
Default primary keys are different based on whether your data warehouse is built on Amazon Redshift or Google BigQuery.
Amazon Redshift
All of the API data sources have a default primary key for each and every table created from them. All the data sources use a field named id
as the default primary key with some exceptions (MongoDB, for example, uses the _id
attribute). If no primary key is given at set up, and there is no default one in the data source (id
field, for example), then a random value is generated instead. To successfully set the primary key, use dynamic fields by placing each attribute name in curly brackets, for example: {first_name}-{last_name}
will use the first_name
and last_name
attributes as the primary key of each row.
Redshift Primary Key Scenarios
Source id column | Enter a primary key | Outcome |
---|---|---|
yes | no | Panoply will automatically select the id column and use it as the primary key. |
yes | yes | Not recommended. Panoply will use the id column but will overwrite the original source values.If you want Panoply to use your source's id column, do not enter a value into the Primary Key field. |
no | no | Panoply creates an id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2 . |
no | yes | Panoply creates a hashed id column using the primary key values entered, while retaining the source columns. |
Google BigQuery
For Panoply on BigQuery, the default primary key will be __panoply_id
. Users may specify a different primary key in the Advanced Settings when configuring the data source, and Panoply will use that to create the __panoply_id
field. If there are any sub-tables, this becomes the foreign key in that sub-table in the format of __{parent_table}_panoply_id
.
Primary Key Scenarios
Source id column | Enter a primary key | Outcome |
---|---|---|
yes | no | Panoply will automatically select the id field and use it to populate the __panoply_id column. |
yes | yes | Panoply will automatically select the id field and use it to populate the __panoply_id . |
no | no | Panoply creates the __panoply_id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2 . |
no | yes | Panoply creates the __panoply_id column using the primary key values entered, while retaining the source columns. |
Any user-entered primary key will be used across all the destination tables.