All talk but no code...

makes lulalala a bluff boy

chef, sunspot solr 與正體中文分詞

| Comments

獨立安裝 solr 不難。
用 Chef 獨立安裝 solr 有一點點難。
用 Chef 裝出一個跟中文分詞套件 mmseg4j 能夠合的 solr 可真是地雷不少。

--

使用到的 cookbook:

cookbook 'hipsnip-jetty', git: 'https://github.com/hipsnip-cookbooks/jetty.git'
cookbook 'hipsnip-solr', git: 'https://github.com/hipsnip-cookbooks/solr.git'

設定:

default_attributes(
  java: {
    jdk_version: "7"
  },
  jetty: {
    port: "8983",
    version: "9.0.3.v20130506",
    link: 'http://eclipse.org/downloads/download.php?file=/jetty/9.0.3.v20130506/dist/jetty-distribution-9.0.3.v20130506.tar.gz&r=1',
    checksum: "eff8c9c63883cae04cec82aca01640411a6f8804971932cd477be2f98f90a6c4"
  },
  solr: {
    version: '4.3.1',
    checksum: '99c27527122fdc0d6eba83ced9598bf5cd3584954188b32cb2f655f1e810886b'
  }
)

這些是 Bert 大大測試出來 OK 的結果。他沒成功試出 opscode 官方版的 cookbook 的搭配。

這裡說一聲,Solr跟mmseg4j的搭配很挑的。
經過測試,Solr 4.2.1 跟 mmseg4j 1.9.1 2.0.0 2.0.1 都不相容。
Solr 4.3.1 也跟 2.0.1 不相容。
建議你先用以下嘗試出的結果。有美國時間再嘗試其他的組合。

接著請把 mmseg4j 的檔案下載下來:

正體中文版的字典檔(units.dic跟words.dic)可以從這裡抓: http://function1122.blogspot.tw/2010/10/mmseg4j-java-55.html
1.9.1的程式從這邊抓:https://code.google.com/p/mmseg4j/downloads/list

然後寫個 recipe 上傳這些檔案到遠端:

directory "#{node['solr']['home']}/lib" do
  owner 'app'
  group 'app'
  action :create
end

%w{mmseg4j-core-1.9.1.jar mmseg4j-solr-1.9.1.jar mmseg4j-analysis-1.9.1.jar}.each do |name|
  cookbook_file "#{node['solr']['home']}/lib/#{name}" do
    owner "app"
    group "app"
    source "solr/#{name}"
  end
end

directory "#{node['solr']['home']}/dic" do
  owner 'app'
  group 'app'
  action :create
end

%w{units.dic words.dic}.each do |name|
  cookbook_file "#{node['solr']['home']}/dic/#{name}" do
    owner "app"
    group "app"
    source "solr/dic/#{name}"
  end
end

然後就是設定 solr schema等等:

solr_schema (與sunspot搭配所以直接修改 text fieldType):

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="/usr/share/solr/dic"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

我用了絕對路徑指定字典檔,因為相對路徑不知道為何無用。

solr_solrconfig 添加:

  <lib dir="/usr/share/solr/lib/" regex=".*\.jar" />

因為 mmseg4j 我放在那裡。

然後就開始 cook 吧。

安裝途中要是發生問題是 jetty user logged in,那就手動登入用 pkill -KILL -u jetty 把他踢出吧。
hipsnip-solr cookbook 也要用新一點的,才會自動幫你把 logger lib裝好。

希望你裝的成功。

為了測試 solr 是否出現問題,我們在 vagrant 內暫時允許 solr web admin 頁面接收請求:

sudo ufw allow 8983

這樣你就能從 http://33.33.33.10:8983/solr/ 檢查 solr 設定是否正確。

選擇你的 core 裡面的 Analysis ,輸入「美國是按流量收費所以高速上網容量都會有所限制」,並選擇 type Text (注意不是 field Text),應該會有出現把「美國」分為一個詞成功。要是分成「美」「國」那就是沒抓到字典檔。

所有出現問題時都去 /var/log/jetty 下面找最新的 log 研究。

我在 local 用 vagrant 裝都沒事,但是在 production 上硬是發生 solr 還是使用舊的 4.2.1 版本。最後我把下面資料夾都刪掉:

- /usr/local/solr
- /user/share/solr
- /tmp/jetty*
- /tmp/hsperfdata*

然後把 Rails 之前有用到的 sunspot-solr gem 改為只在 development 讀取。

RubyConf Taiwan 2014 感想

| Comments

陽明大學的禮堂很大,大家都能很悠閒地找個地方坐下,但是缺點就是太暗。

  • 投影主螢幕太暗,尤其是跟旁邊的推特牆比的時候,要看很吃力。
  • 講台上的人很少打光,常常處在背光面,我想攝影師應該也很辛苦才拍的出清楚又光線夠的講師演講照。
  • 講台下不想聽的人要寫程式也因為沒有光線很容易眼睛酸痛。
  • 無法轉換氣氛,導致會場有點悶悶的,也容易想睡。

第一天午餐供應很慢也吃不飽,好家在第二天就換成便當了。不過該場地要吃飯座位也不夠,許多人只能站著吃也是很麻煩。作對照組,2012年的場地則是好的不得了,因為就是在正式的飯廳裡面用餐,桌子夠食物也夠。

同步口譯很棒,我看很多外國人都受惠了。

最後的最後因為COC事件導致結束的有點尷尬,好險沒有外國參與者爆料在國外的論壇,爭議止於在國內的圈圈。

辛苦了慕凡(籌辦人)跟所有工作人員,希望下次能辦的更讚。啊下次準備排練一下開場白跟結尾會更畫龍點睛唷。

使用 sequel pro 連遠端 mysql

| Comments

之前一直使用 phpmyadmin 作偶爾需要的資料庫修正,不過多裝個 PHP 在 Rails 網站有些雞肋,而且設定 https 也要多花心神,所以就開始研究使用 Mac 上的 sequel pro 連結到資料庫直接進行操作。

我使用 chef 的 mysql cookbook ,其預設使用伺服器的 ip 位置作為 bind_address,但是我似乎無法作連結,所以我設成 localhost:

override_attributes(
  mysql: {
    bind_address: 'localhost'

使用 sequel pro 時,當我在連線頁面時選擇 ssh 選項會有個 bug ,就是我無法選擇 ssh user,輸入框被蓋掉了。所以你按了 ssh 之後,先填個名字,再按 Add to Favorites,下面的輸入框才會冒出來。

MySQL Host: 127.0.0.1
SSH Host: 你 server 的地址
SSH User: 你的 ssh user id
其他有的沒的自己填。
就能連上去了。

Yahoo Mail Spam Button Broken

| Comments

Yahoo Mail redesigned its web interface on October 8, 2013. The new interface relies heavily on AJAX techniques, which improved its response time a lot. However the new interface also has several bugs, and Yahoo seems to remain oblivious from them. Here I describe the spam button bug, and how ineffective the customer support system is.

Not-Spam buttom marks something as spam

If you view a mail inside the spam folder, try hovering the cursor over the "not spam" button, you will see the tooltip message "Move selected conversation to Spam folder".

Note that this only happens in mail view page, not in the spam folder listing page.

Once you clicked on the "Not Spam" button, we get transferred back to the spam listing page. After a few seconds, a blue popup will appear at the buttom of the page, saying "Your message has been placed in the Spam folder and sent to Yahoo! for further investigation".

And refreshing the listing page, you see that unspammed email is still there.

Once again, this only happens in the mail view page, not the listing page.

Why do I care?

I am a web app programmer. I see that 90% of the spam reports come from Yahoo Mail, and most of the reports are from new user registration confirmation emails. I asked some users, and they said that they merely clicked "not spam" button in the registration email.

The faux spam reports is troublesome for us, because we use Sendgrid emailing service. It has a reputation score. If a spam report is received by them, our reputation drops. When the reputation drop too much, we will be blocked by Sendgrid from sending emails.

The 'Not Spam' bug is bad, because once a user clicks it, Yahoo believes deeper that we are spammers, and our emails will more likely land in their spam folder, which results in more users clicking on the 'no spam' button. It is a snowball effect.

Unusable customer support

I tried to report this issue, but I couldn't find a way to contact to a real person from Yahoo. I found this bug report at Yahoo Feedback. It was reported on October 14th 2013. However the admin closed the issue because:

Because this forum is intended to gather feedback/suggestions for Yahoo! Mail, if you continue experiencing this problem, please contact Customer Care by going to:
https://io.help.yahoo.com/contact/index?y=PROD_wieuowiuero&locale=en_US&page=contact&srcContact=acct_care#comm-form
Thank you for using Yahoo! Mail!

I went to that suggested link, which is a Yahoo Help page regarding Yahoo Account issues. I was given two choices: by email or by community.

I choose the email, filled up my questions.

I later received an automatic feedback email, however its title seems to suggest the category used is incorrect:

Title: Hacked accounts : spam is being sent from my account [Incident:140108-069640]
Content:
Thanks for contacting Yahoo Customer Care.
Your Incident ID is: 140108-069640
If you're reporting abuse, thanks for improving our community (it means a lot to us). We'll dig in to your report and take care of this. We may contact you if we need more information to complete our investigation.
If you aren't reporting abuse but are trying to ask a question or get help, we'll get back to you as soon as possible.

Later a second email came:

The first step to resolving issues with your internet browser is to make sure you are using the current version. You can download and install the latest versions of Firefox, Safari, or Internet Explorer at the following links:
If updating to the latest browser version does not resolve your issue, try clearing the cache and cookies in your browser. If you do not know how to do this, please visit the Clear Cache and Cookies Wizard.

This is not related to my bug report at all. Also I missed 72 hours deadline to reply back because I was busy, so the issue was closed.

The second option is to ask for help from the "Community". However no one seems to be using it, as the last post was posted in 2012.

In the email interface, there is a Help link in the config menu. It direct us to the FAQ page. I clicked "Contact Customer Care". There I choose "Errors" -> "My issue does not appear in the list", and I am given the outdated "Community" link again, this time redirected to Yahoo Answers. I don't think bug reports goes there.

Conclusion

Hopefully someone in Yahoo can see this, and fix this problem. We see that the web interface is broken. We also see that the dev/testing team is broken too, not able to discover/fix the bug for more than 3 months. Lastly Yahoo really need to improve its customer support system.

Update 2014/2/12

I resend an email to Yahoo Customer Care, and after a few replies, the Taiwanese branch managed to understand the bug. They told me on 2014/01/23 that they have reported this to the team, and I am still waiting for their reply.

Update 2014/2/24

This morning I checked and found out that the bug has been fixed. The "Not Spam" button is no longer acting as the "Spam" button. Thanks.

事後 eager load associations

| Comments

Eager loading 是 Rails 解決 N+1 問題的方法,用 includes 方法就能在讀取資料庫資料時順便把 association 也讀進來。

不過 includes 是得在讀取之前就先下的指令,而我卻有時候卻只有一堆已經讀進來的 Active Record 物件,想要只用一次 sql query 把每個物件自己的 association 都讀進來,是該怎樣作呢?其實對一個陣列的 ActiveRecord 我們可以這樣作:

posts = [a, b, c] # some AR Post objects
ActiveRecord::Associations::Preloader.new(posts, :comments).run()

Capistrano deploy 時手動跳過 asset precompile

| Comments

Rails deploy 時的 asset precompile 十分耗時,所以已經有許多方法來縮短這個時間。不過因為我有時候需要能夠強制跳過的功能,所以就寫了以下的方法:

#deploy.rb 中插入

callback = callbacks[:after].find{|c| c.source == "deploy:assets:precompile" }
callbacks[:after].delete(callback)
after 'deploy:update_code', 'deploy:assets:precompile' unless fetch(:skip_assets, false)

需要強制跳過時,下cap deploy -S skip_assets=true 即可。


題外話,目前的兩種縮短方式中:

turbo-sporocket-rails 雖然縮短了時間,但是還是有個底限。在沒有改變的時候,編譯還是得花個1分鐘。

使用 https://gist.github.com/xdite/3072362 時改了 route.rb 就會重編譯。手動指定要編譯的東西時得自己把 application.rb 加入判斷,然後 application.rb 改了也是會重編譯。

Rails Config 私人偏方

| Comments

在一個 Rails 專案,常需要使用一些設定值。最常見的就是各種 API 使用的 key。通常我們會把這些設定獨立起來放在一個 yaml 檔案裡面,避免紀錄在 git 的版本管理系統裡面。個人使用了 Settingslogic ,覺得十分合用,不過在一些細節上我想要推廣一些撇步:

讓設定檔成為能獨立運作的 class

Rails 的啟動是十分花時間的,在 Ruby 1.9 下面通常要等個幾十秒。有時候有些可以獨立運作的套件為了使用一個 Rails 下面的設定值,就得花時間啟動整個 Rails 才能取得設定。要避免這種情形,Settingslogic 我是這樣寫的:

# config/app_config.rb 檔案
require 'settingslogic'
class AppConfig < Settingslogic
  source File.dirname(__FILE__) << "/app_config.yml"
end

首先是手動 require Settingslogic,然後其 yaml 檔案則是使用相對路徑作參照(與 app_config.rb 同一個目錄)。這樣子在其他環境就能簡單 require "./path/to/app_config" 讀取設定了。

我是把我的 app_config 放在 config 資料夾下面,然後在 config/initializers/ 下開個 app_config.rb 檔案 require 我的 app_config

我的 config/app_config.yml 長這樣:

exception_notification:
  recipients:
    - lulz@example.com
    - lols@example.com
sendgrid:
  user_name: 'foobar'
  password: '2000'
rollbar:
  access_token: dghgkalt4k5439fkgio43pb2

當然這個檔案不會放進版本控制。會放進版本控制的是 config/app_config.yml.example 檔案,裡面只有放 key 沒有值,讓大家知道需要設定什麼。我也沒有用 namespace 之類的東西,反正一個地方就一個設定,依照 exmaple 照樣畫葫蘆即可。

把網域資訊存進設定檔中

一個網站的網域資訊是很多設定不可或缺的一部分。比如說我使用 capistrano_nginx 就常常需要像是 .example.com 的設定。網域設定又會因為環境的不同而改變。 production 跟 staging 各會有不同的設定。所以我覺得把網域寫在 config 中是最恰當的。

# app_config.rb 中
  # returns domain string
  # pass in false for segment key to hide that segment, e.g. `protocol:false`
  def self.domain(params = {})
    params = {protocol:'http'}.merge(params)
    params = self.get('domain_setting').to_hash.merge(params)

    url = ''
    url << params[:protocol].clone << '://' if params[:protocol]
    url << params[:subdomain].clone << '.' if params[:subdomain]
    url << params[:domain]
    url << ':' << params[:port].to_s if params[:port]
    url
  end
# app_config.yml 中
# 注意我的 key 是 symbol 喔
domain_setting:
  :domain: 'lvh.me'
  :port: 3000

這樣子,我在就可以呼叫下列指令來取得不同的網域資訊:

# irb 中
AppConfig.domain #=> "http://lvh.me:3000"
AppConfig.domain(protocol:false) #=> "lvh.me:3000"
AppConfig.domain(protocol:false, subdomain:'admin') #=> "admin.lvh.me:3000"

這裡使用的參數跟 url_for 類似。

我在 production.rb 跟 staging.rb 當然也就這樣改了:

config.asset_host = AppConfig.domain(subdomain:'assets')
config.action_mailer.default_url_options = { :host => AppConfig.domain(protocol:false) }

像是 staging/development 這種地方,把可能變動的網域給寫死在 rb 檔然後存進去 git 中本來就怪怪的,所以我都用設定檔來作設定。另一個好處是,想要修改網域就只要修改一個地方,徹底貫徹DRY懶人精神!

最後,記得也要準備一份只有 key 沒有 value 的 app_config.yml.example ,放入版本管理中,給以後的人作參考喔。