- In hive-cli, rename table with command:
- 10 minutes later, it prompts error.
- However, before executing the "rename" command, the directory was not exist in S3, so we don't expect such an error.
- AWS EMR
- AWS S3
- Large table (~ 600GiB) resides in S3.
- Ignore the error, and wait for some minutes. The table will be renamed eventually after all files in S3 are renamed by hive metastore.
- Or, extend the metastore socket timeout to eliminate the error.
hive> alter table large_table rename to large_table_bk;
Time taken: 798.823 seconds
- In hive, to rename a managed table in default warehouse (hive.metastore.warehouse.dir), it will also rename the underlying directory. For example, in HDFS, from /user/hive/warehouse/table_before to /user/hive/warehouse/table_after.
- The rename operation is done by hive metastore.
- In S3, there's no build-in rename operation, so actually, rename = copy + delete.
- If the dataset is large (600+ GiB), it might take more than 10 minutes for metastore to finish the rename operation. Then, the socket between hive-cli and metastore will timeout with log:
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
- Monitor S3, we can see the rename operation is still ongoing.
Total Objects: 2101
Total Size: 237.8 GiB
$ aws s3 ls s3://feichashao-hadoop/warehouse/large_table/ --recursive --human-readable --summarize | tail -n2
Total Objects: 2348
Total Size: 265.8 GiB
- s3n-working is still running inside metastore.
"s3n-worker-19" #86 daemon prio=5 os_prio=0 tid=0x00007f8928455000 nid=0x1623 runnable [0x00007f89171d5000]
"s3n-worker-18" #85 daemon prio=5 os_prio=0 tid=0x00007f8928454800 nid=0x1622 runnable [0x00007f89174d6000]
"s3n-worker-17" #84 daemon prio=5 os_prio=0 tid=0x00007f8928453800 nid=0x1621 runnable [0x00007f8917ada000]
- Wait for some time, and we can see the table is renamed successfully even if we see the error.
 Preserve the location of table created with the location clause in table rename