319 Event Data Collection

編輯歷史

時間 作者 版本
2017-07-07 16:45 – 16:45 (unknown) r0 – r1
顯示 diff
+ 319 Event Data Collection
+
+ *目的
+ 基於文化 (歷史) 與情報考量,大量收集事件相關資料。
+ 有隱私問題的資料要處理或略過。
+
+ *範圍 (暫定)
+ *PTT post
+ *用 tkirby 的工具可砍
+ *Facebook post
+ *news report
+ *blog article
+ *video stream record
+ *UStream recorded video 有 timestamp
+ *video transcripts
+ *misc video clips
+ *立委質詢影片
+ *https://hackpad.com/323-vA3xcpQnSCB
+ *https://www.youtube.com/user/marktwaingroup
+ *youtube 還沒找到 timestamp orz
+ *g0v IRC
+ *padnews parse hackpad 文字直播的結果:
+ *十分陽春,若有更好的處理方式,歡迎 patch 或告知。
+ *cassi ++ 非常好閱讀的介面
+ *http://padnews.linode.caasigd.org/
+ *API
+ *latest: http://padnews.linode.caasigd.org/json/
+ *all: http://padnews.linode.caasigd.org/json/all/
+ *single entry: http://padnews.linode.caasigd.org/json/0/
+ *repos
+ *parser: https://github.com/g0v/padnews
+ *cli: https://github.com/g0v/padnews-cli
+ *web: https://github.com/g0v/padnews-web
+ *實體報紙
+ *photo albums?
+ *小道消息?
+
+ *格式需求
+ *內容
+ *來源
+ *timestamp 或 time range
+
+ Video: tag/comment by timestamp?
+
+ *Tools
+ *https://github.com/zbryikt/ptt-crawler
+ *https://www.npmjs.org/package/streamy-data
+ *https://github.com/g0v/padnews
+
+ *Known Source Sites
+ *http://taiwan0314.s3-website-ap-northeast-1.amazonaws.com/
+ *http://www.appledaily.com.tw/realtimenews/article/new/20140329/369121/
+ *http://time-fumao.rhcloud.com/index.html
+
+ *Application
+ *l km thiann tioh in gin
+ *use file hash to identify duplicated resource?
+ *news archiving