安装
在CentOS中,安装 ruby 和 mysql 数据库。
变量
全局变量用 $ 开头;
实例变量用 @ 开头;
局部变量直接来;
@cust_id=id # 实例变量
var="hehe" #局部变量
方法(函数)
expr..
end
如果函数不需要参数,直接用名字就能调用。
Socket
hostname = 'localhost'
port = 2000
s = TCPSocket.open(hostname, port)
while line = s.gets # 从 socket 中读取每行数据
puts line.chop # 打印到终端
end
s.close # 关闭 socket
HTTP例子
host = 'www.w3cschool.cc' # web服务器
port = 80 # 默认 HTTP 端口
path = "/index.htm" # 想要获取的文件地址
# 这是个 HTTP 请求
request = "GET #{path} HTTP/1.0\r\n\r\n"
socket = TCPSocket.open(host,port) # 连接服务器
socket.print(request) # 发送请求
response = socket.read # 读取完整的响应
# Split response at first blank line into headers and body
headers,body = response.split("\r\n\r\n", 2)
print body # 输出结果
示例 https://github.com/feichashao/fetch_kw http://rubylearning.com/satishtalim/ruby_socket_programming.html
line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";
if ( line1 =~ /Cats(.*)/ )
puts "Line1 contains Cats"
end
if ( line2 =~ /Cats(.*)/ )
puts "Line2 contains Dogs"
end
# Put kw into Database
db_result = db.query("INSERT INTO #{KW_TBL_NAME}(keyword) VALUES("#{kw}")")
# Get more keywords
result_div = /<div id="rs">(.*?)<\/div><div id=/m.match(content) # Match <div id = "rs">
if not result_div.respond_to?("[]") then return end
result_kw = result_div[1].scan(/<a.*?>(.*?)<\/a>/m) # Match keywords
# Put keywords into to_visit.
if result_kw.respond_to?("each") and @to_visit.length <= MAX_TO_VISIT
result_kw.each do |rkw|
@mutex.lock
@to_visit << rkw
@mutex.unlock
puts "Got kw: #{rkw}\n"
end
end多线程
t1 = Thread.new{fetch()}
t2 = Thread.new{fetch()}
t3 = Thread.new{fetch()}
t4 = Thread.new{fetch()}
t5 = Thread.new{fetch()}
t1.join
t2.join
t3.join
t4.join
t5.join爬虫示例
抓取百度结果和关键字.参考资料
http://www.w3cschool.cc/ruby/ruby-tutorial.html