Support for Multiple Data Types for the _id Field
Each document in a MongoDB collection includes the _id
field that serves as its primary key. MongoDB supports all data types for this field.
To support this, as of Release 1.61, Hevo has enhanced the queries run on the MongoDB Source such that _ids
of all types are retrieved. However, by itself, this solution can cause documents to get missed or overwritten during ingestion and loading respectively.
Also note that MongoDB does not support null values for the _id
field.
For example:
-
During ingestion, the query results for relational operations on the
_id
field would only include documents having an_id
of that type. Consider a collection having documents with_id
of type String as well as Numeric. The documents retrieved for an operation,_id > 100
, after sorting on_id
would only be ones with a numeric_id
. All the documents with a string_id
would be excluded. -
During loading to the Destination, documents would get overwritten if the value of the
_id
field was same even if the data type was different. Consider two documents, one with_id
as numeric 1 and the other with_id
as a string, “1”. If either one was loaded first, the other would overwrite it.
Therefore, the enhanced queries are supported by the following changes:
-
To ensure documents are not missed during ingestion, Hevo stores the last read
_id
for all the types in contrast to a single_id
value being stored previously. -
To pre-empt the loading failures, Hevo uses the
__hevo_id
field to identify the documents to be loaded. The__hevo_id
is a string value generated from the hash of the data type of the_id
field and its value. For example, suppose the_id
value is numeric 1. Then,__hevo_id
is the SHA256 hash of “Integer-1”. The documents can be sorted as per their__hevo_id
. The_id
field still remains the primary key for the table.