Data Ingestion
  • 13 Aug 2023
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Data Ingestion

  • Dark
    Light
  • PDF

Article summary

As your data travels from a connector into your Panoply database, it passes through Panoply’s Data Ingestion Engine. The Data Ingestion Engine is governed by various constraints and standards and several transformations are performed.

For example, you may have three connectors that each format dates differently. As data passes from those sources into your Panoply database, the Data Ingestion Engine standardizes the disparate formats into one consistent date format.

Data Ingestion Engine Specifications

The following sections explain how the Data Ingestion Engine handles destinations, dates, timestamps, numbers and geography.

Dates

Dates are saved in the format: YYYY-MM-DDThh:mm:ss.sssZ. This is compliant with ISO-8601.

Panoply supports these date formats:

Date formatExample
ANSI CMon Jan _2 15:04:05 2006
Unix DateMon Jan _2 15:04:05 MST 2006
Ruby DateMon Jan 02 15:04:05 -0700 2006
RFC 1123Mon, 02 Jan 2006 15:04:05 -0700
RFC 3339 (ISO 8601 profile)2013-03-31T10:05:04.9385623+03:00
year/month/day2013-03-28 10:05:00 +0000 UTC
Date without day2014-04

Timestamps

Timestamps are differentiated from dates by precision. Timestamps include an exact point in time with a microsecond precision regardless of location.

Panoply on Redshift supports both string and integer timestamps that are between 8 and 14 bytes. Longer or shorter timestamps are not considered applicable. Timestamp resolution is in seconds. The Data Ingestion Engine resolves 1432399705 and 1432399705000 to the same UTC date of 2015-05-23T16:48:25Z.

Panoply on BigQuery supports 8 byte timestamps in the format:

YYYY-[M]M-[D]D[( \|T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]

  • YYYY: Four-digit year
  • [M]M: One or two-digit month
  • [D]D: One or two-digit day
  • ( \|T): Space or a T separator
  • [H]H: One or two-digit hour (valid values from 00 to 23)
  • [M]M: One or two-digit minutes (valid values from 00 to 59)
  • [S]S: One or two-digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (microsecond precision)

Numbers

Panoply uses a double-precision floating-point format for numbers. This means the largest number Panoply can parse is 9,007,199,254,740,991.

Geography

In Panoply's BigQuery solution, the data ingestion engine automatically identifies WKT geography values and ingest them into BigQuery as a Geography data type. It includes the following geography data:

  • Point - For example: POINT(1 2)
  • Linestring - For example: LINESTRING(30 10, 10 30, 40 40)
  • Polygon - For example: POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))
  • MultiPoint - For example: MULTIPOINT(10 40, 40 30, 20 20, 30 10)
  • MultyPolygon - For example: MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)),((15 5, 40 10, 10 20, 5 10, 15 5)))
  • GeometryCollection - For example: GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10, 20 20, 10 40),POLYGON((40 40, 20 45, 45 30, 40 40)))

Was this article helpful?

What's Next