Issue
– In hive-cli, rename table with command:
[cc lang=”text”]
hive> alter table large_table_bk rename to large_table;
[/cc]
– 10 minutes later, it prompts error.
[cc]
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. New location for this table default.large_table already exists : s3://feichashao-hadoop/warehouse/large_table
[/cc]
– However, before executing the “rename” command, the directory was not exist in S3, so we don’t expect such an error.
Environment
– AWS EMR
– AWS S3
– Large table (~ 600GiB) resides in S3.
Resolution
– Ignore the error, and wait for some minutes. The table will be renamed eventually after all files in S3 are renamed by hive metastore.
– Or, extend the metastore socket timeout to eliminate the error.
[cc]
$ hive –hiveconf hive.metastore.client.socket.timeout 1h
hive> alter table large_table rename to large_table_bk;
OK
Time taken: 798.823 seconds
[/cc]
Root Cause
– In hive, to rename a managed table in default warehouse (hive.metastore.warehouse.dir), it will also rename the underlying directory. For example, in HDFS, from /user/hive/warehouse/table_before to /user/hive/warehouse/table_after.
– The rename operation is done by hive metastore.
– In S3, there’s no build-in rename operation, so actually, rename = copy + delete.
– If the dataset is large (600+ GiB), it might take more than 10 minutes for metastore to finish the rename operation. Then, the socket between hive-cli and metastore will timeout with log:
[cc]
metastore.RetryingMetaStoreClient (RetryingMetaStoreClient.java:invoke(218)) – MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. alter_table_with_environmentContext
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
[/cc]
– Monitor S3, we can see the rename operation is still ongoing.
[cc]
$ aws s3 ls s3://feichashao-hadoop/warehouse/large_table/ –recursive –human-readable –summarize | tail -n2
Total Objects: 2101
Total Size: 237.8 GiB
$ aws s3 ls s3://feichashao-hadoop/warehouse/large_table/ –recursive –human-readable –summarize | tail -n2
Total Objects: 2348
Total Size: 265.8 GiB
[/cc]
– s3n-working is still running inside metastore.
[cc]
$ sudo -u hive jstack 10834 | grep s3n-worker
“s3n-worker-19” #86 daemon prio=5 os_prio=0 tid=0x00007f8928455000 nid=0x1623 runnable [0x00007f89171d5000]
“s3n-worker-18” #85 daemon prio=5 os_prio=0 tid=0x00007f8928454800 nid=0x1622 runnable [0x00007f89174d6000]
“s3n-worker-17” #84 daemon prio=5 os_prio=0 tid=0x00007f8928453800 nid=0x1621 runnable [0x00007f8917ada000]
[/cc]
– Wait for some time, and we can see the table is renamed successfully even if we see the error.
[cc]
hive> show tables;
large_table
[/cc]
Reference
[1] Preserve the location of table created with the location clause in table rename
https://issues.apache.org/jira/browse/HIVE-14909