AWS SNS 自定义邮件格式

AWS 的 SES (Simple Email Service) 可以提供邮件收发服务。对于邮件收发的反馈,如 Bounce message, 可以发送到 SNS (Simple Notification Service) 作进一步处理。SNS 可以指定某个邮件地址作为订阅者,将消息发送到该邮箱中。然而,SNS发出来的邮件是 JSON,可读性不好,例如:

{"notificationType":"Delivery","mail":{"timestamp":"2019-02-18T06:03:02.669Z","source":"kfc@feichashao.com","sourceArn":"arn:aws:ses:us-west-2:xxxxxx:identity/feichashao.com","sourceIp":"205.251.234.36","sendingAccountId":"xxxxxx","messageId":"01010168ff335c8d-f00ce1c1-e103-49cd-912f-9f397c7a463c-000000","destination":["feichashao@gmail.com"],"headersTruncated":false,"headers":[{"name":"From","value":"kfc@feichashao.com"},{"name":"To","value":"feichashao@gmail.com"},{"name":"Subject","value":"free kfc"},{"name":"MIME-Version","value":"1.0"},{"name":"Content-Type","value":"text/plain; charset=UTF-8"},{"name":"Content-Transfer-Encoding","value":"7bit"}],"commonHeaders":{"from":["kfc@feichashao.com"],"to":["feichashao@gmail.com"],"subject":"free kfc"}},"delivery":{"timestamp":"2019-02-18T06:03:03.917Z","processingTimeMillis":1248,"recipients":["feichashao@gmail.com"],"smtpResponse":"250 2.0.0 OK  1550469783 q2si13329671plh.79 - gsmtp","remoteMtaIp":"74.125.20.27","reportingMTA":"a27-30.smtp-out.us-west-2.amazonses.com"}}

怎么能让这个提醒邮件变得更加友好呢? SNS目前不支持自定义邮件格式。一个思路是,将 SNS 的消息发送到 Lambda 上,让 Lambda 处理好格式后,再发送到指定邮箱。即 SES -> SNS -> Lambda -> SES.
继续阅读“AWS SNS 自定义邮件格式”

如何使 Hive 对 DynamoDB 进行并发读取?

问题背景

DynamoDB 创建了类似如下的表:

{
"accessid": "c63b88a3-1503-4c2c-a7c2-3a1ffde7bff9", // Primary key.
"date": "2018-12-12",
"when": "2018-12-12 17:22:15",
"context": "something"
}

这个表用来记录访问记录,以 accessid (随机字符串) 作为 Primary key. Date 记录当条记录的生成日期。

我们希望通过 Hive 每天将前一天的数据从 DynamoDB 备份到 S3 上,具体方法可参考文档[1].

hive> INSERT OVERWRITE TABLE  s3_table
    > SELECT * FROM ddb_table WHERE date="2018-12-12";

Dynamodb表里存放了很多天的数据,执行以上操作会消耗很长时间,如何能加快速度?比如让 Hive 进行并发读取?
继续阅读“如何使 Hive 对 DynamoDB 进行并发读取?”

Hive 去除 CSV 字段中的双引号

问题

在 AWS 中,可以开启详细账单的功能。开启详细账单后,AWS 每天会多次将详细的账单数据存入到指定的 S3 bucket 中[1]。
账单数据是一个 CSV 文件,示例如下:

"InvoiceID","PayerAccountId","LinkedAccountId","RecordType","ProductName","RateId","SubscriptionId","PricingPlanId","UsageType","Operation","AvailabilityZone","ReservedInstance","ItemDescription","UsageStartDate","UsageEndDate","UsageQuantity","Rate","Cost"
"Estimated","xxxxxxxxxxxx","xxxxxxxxxxxx","LineItem","Amazon Simple Queue Service","16850885","1846142824","1292565","CNN1-Requests-Tier1","GetQueueAttributes","","N","First 1,000,000 Amazon SQS Requests per month are free","2019-01-01 00:00:00","2019-01-01 01:00:00","60.0000000000","0.0000000000","0.0000000000"
"Estimated","xxxxxxxxxxxx","xxxxxxxxxxxx","LineItem","Amazon Simple Queue Service","16850885","1846142824","1292565","CNN1-Requests-Tier1","GetQueueUrl","","N","First 1,000,000 Amazon SQS Requests per month are free","2019-01-01 00:00:00","2019-01-01 01:00:00","180.0000000000","0.0000000000","0.0000000000"

第一行是每个字段的名字,后面的行是相应的数据。
继续阅读“Hive 去除 CSV 字段中的双引号”

EC2两张网卡连接两个子网,分别关联EIP,其中一个EIP ping不通,怎么办?

问题


- 一个有两张网卡的 EC2 instance (RHEL7.5),每个网卡分别对应一个public subnet, 每个网卡也关联一个 Public IP.
- 从其他网络 ping 这两个 IP 地址,发现其中一个IP能ping通,另一个IP不能ping通。

[ec2-user@ip-172-31-30-14 ~]$ ping 52.80.82.70 -c2
PING 52.80.82.70 (52.80.82.70) 56(84) bytes of data.
64 bytes from 52.80.82.70: icmp_seq=1 ttl=63 time=1.71 ms
64 bytes from 52.80.82.70: icmp_seq=2 ttl=63 time=1.80 ms

[ec2-user@ip-172-31-30-14 ~]$ ping 52.81.2.166 -c2
PING 52.81.2.166 (52.81.2.166) 56(84) bytes of data.

--- 52.81.2.166 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1004ms

继续阅读“EC2两张网卡连接两个子网,分别关联EIP,其中一个EIP ping不通,怎么办?”

为什么 EC2 上的根文件系统在启动时会自动扩大到 EBS 的大小?

问题

- 在 Launch 一个新的 instance 的时候,默认根盘的大小是8G。如果指定更大的大小,比如20G,在启动系统时,我们可以发现根文件系统的大小会自动变成 20G,而无需手动执行 resizefs/xfs_grow 之类的操作。
- 如果后期加大根盘EBS的大小,重新启动系统时,根文件系统也会自动扩大。
- 为什么?是什么软件或者机制实现的这个效果?
继续阅读“为什么 EC2 上的根文件系统在启动时会自动扩大到 EBS 的大小?”