Greg, thanks for checking.
Since we needed a short term solution for a project that is supposed to go live shortly and we understand that this would be a major change/addition in the Impala data types and consequently in its code base, we found a workable solution for now and we are going to "live with it".
While you were out we also discussed it with Peter Ebert and I created a new 'Feature Request' with our vendor for the 'TIMESTAMP WITH TIMEZONE' data type in Impala.
For your convenience this is what I put in the support case I created:
There are several use cases where the current 'TIMESTAMP' (without timezone) data type in Impala falls short.
One such case would be in a Kudu replication scenario where a secondary (replica) instance of the Kudu data would have to be periodically (say every 15 minutes) updated with the data from say a primary instance. In this case Todd Lipcon suggested us to have a "timestamp" column in Kudu called say 'updated_on' and periodically run something like this:
UPSERT INTO backup_cluster_table SELECT * FROM original_table WHERE updated_on >= $last_update_time
Unfortunately with Impala's current implementation of the 'TIMESTAMP' data type, a query like the one above would not work during the Daylight Saving Time change in the fall when we "move the clock" back an hour (I can send more details about this situation, if needed).
Another important use case for us (and probable not just us) is for auditing purposes (regulatory compliance, etc) where we need to know when (in absolute time) an event occurred – a very good example for this use case would be with customers trying to use Impala/Kudu for storing logs of events, or for security applications, where the information about the absolute time an event occurred is crucial.