Support for Multiple Data Types for the _id Field
Each document in a MongoDB collection includes the _id field that serves as its primary key. MongoDB supports all data types for this field.
To support this, as of Release 1.61, Hevo has enhanced the queries run on the MongoDB Source such that _ids of all types are retrieved. However, by itself, this solution can cause documents to get missed or overwritten during ingestion and loading respectively.
Also note that MongoDB does not support null values for the _id field.
For example:
-
During ingestion, the query results for relational operations on the
_idfield would only include documents having an_idof that type. Consider a collection having documents with_idof type String as well as Numeric. The documents retrieved for an operation,_id > 100, after sorting on_idwould only be ones with a numeric_id. All the documents with a string_idwould be excluded. -
During loading to the Destination, documents would get overwritten if the value of the
_idfield was same even if the data type was different. Consider two documents, one with_idas numeric 1 and the other with_idas a string, “1”. If either one was loaded first, the other would overwrite it.
Therefore, the enhanced queries are supported by the following changes:
-
To ensure documents are not missed during ingestion, Hevo stores the last read
_idfor all the types in contrast to a single_idvalue being stored previously. -
To pre-empt the loading failures, Hevo uses the
__hevo_idfield to identify the documents to be loaded. The__hevo_idis a string value generated from the hash of the data type of the_idfield and its value. For example, suppose the_idvalue is numeric 1. Then,__hevo_idis the SHA256 hash of “Integer-1”. The documents can be sorted as per their__hevo_id. The_idfield still remains the primary key for the table.