daru basic

Category: Note Tag: Ruby & Rails Date:

DARU

基础

Ruby 数据处理工具,可创建及使用日期索引 主要可用结构:

  • Daru::Vector 一维
  • Daru::DataFrame 二维
  • Daru::DateTimeIndex 时间索引序列

在创建一维或二维结构时,可使用时间索引序列作为索引

data = DataTable.select("date,val,total_val").order("date asc")

data_index = Daru::DateTimeIndex.new(data.collect(&:date))
data_dv = Daru::Vector.new(data.collect(&:val), index: data_index)

df = Daru::DataFrame.new(
  {:val=>data.collect(&:val),
  :total_val=>data.collect(&:total_val)
  },
  index:data_index
  )

其中df.val结构与data_dv相同

[9] pry(main)> data_dv
=> #<Daru::Vector(19)>
 2018-03-30T00:00:00+                  1.0
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
 2018-05-02T00:00:00+                  1.0
 2018-05-31T00:00:00+               1.0885
 2018-06-01T00:00:00+               1.0833
 2018-06-29T00:00:00+               1.0586
                ...                 ...

可通过时间索引查找数据

[14] pry(main)> data_dv['2018-04']
=> #<Daru::Vector(2)>
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
[15] pry(main)> data_dv['2018-04-02']
=> 0.9999

[16] pry(main)> data_dv.index
=> #<Daru::DateTimeIndex(19) 2018-03-30T00:00:00+00:00...2018-12-28T00:00:00+00:00>
[17] pry(main)> data_dv.index['2018-04']
=> #<Daru::DateTimeIndex(2) 2018-04-02T00:00:00+00:00...2018-04-27T00:00:00+00:00>

获取数据的特殊情况

当某月只有一条数据时,按月索引会直接取到该数据

[19] pry(main)> data_dv['2018-03']
=> 1.0

按月份获取最后日期之后的数据会报错

[27] pry(main)> data_dv['2019-01']
ArgumentError: bad value for range
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:547:in `slice_between_dates'

获取最早日期前的月份时,获取之前一年内月份都会直接返回全部数据

[29] pry(main)> data_dv['2018-02']
=> #<Daru::Vector(19)>
 2018-03-30T00:00:00+                  1.0
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
 2018-05-02T00:00:00+                  1.0
 2018-05-31T00:00:00+               1.0885
 2018-06-01T00:00:00+               1.0833
 2018-06-29T00:00:00+               1.0586
                ...                 ...

[29] pry(main)> data_dv['2017-03']
=> #<Daru::Vector(19)>
 2018-03-30T00:00:00+                  1.0
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
 2018-05-02T00:00:00+                  1.0
 2018-05-31T00:00:00+               1.0885
 2018-06-01T00:00:00+               1.0833
 2018-06-29T00:00:00+               1.0586
                ...                 ...

获取最早日期一年之前月份的数据报错

[28] pry(main)> data_dv['2017-02']
ArgumentError: Key 2017-02 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'

按年获取数据时,如果该年内无数据,会报错

[36] pry(main)> data_dv['2019']
ArgumentError: Key 2019 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'
[37] pry(main)> data_dv['2017']
ArgumentError: Key 2017 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'

可以直接取首个数据,但取最后一个数据时需要先转换为 Array 类型

[31] pry(main)> data_dv.first
=> 1.0
[33] pry(main)> data_dv.to_a.last
=> 0.8472
[34] pry(main)> data_dv.last
NoMethodError: undefined method `last' for #<Daru::Vector:0x00007fe58a713f28>
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/vector.rb:1420:in `method_missing'

使用经验

判断是否包含某日数据

[48] pry(main)> if data_dv.index.include?('2018-04-02')
[48] pry(main)*   res = data_dv['2018-04-02']
[48] pry(main)* end
=> 0.9999

获取某个日期之前或之后的数据

[53] pry(main)> dates = Daru::Vector.new(data_dv.index)
=> #<Daru::Vector(19)>
                    0 2018-03-30T00:00:00+
                    1 2018-04-02T00:00:00+
                    2 2018-04-27T00:00:00+
                    3 2018-05-02T00:00:00+
                    4 2018-05-31T00:00:00+
                  ...                  ...
[61] pry(main)> bool = dates < '2018-04-30'
=> #<Daru::Core::Query::BoolArray:70311951424740 bool_arry=[true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]>
[62] pry(main)> ids = dates.where(bool)
=> #<Daru::Vector(3)>
                    0 2018-03-30T00:00:00+
                    1 2018-04-02T00:00:00+
                    2 2018-04-27T00:00:00+
[67] pry(main)> ids_index = dates.where(bool).index
=> #<Daru::Index(3): {0, 1, 2}>
[68] pry(main)> new_data = data_dv.at(*ids_index)
=> #<Daru::Vector(3)>
 2018-03-30T00:00:00+                  1.0
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908