Jan 01, 0001

layout: “post” title: “Hello world to Docker Mac” date: “2016-04-15 16:34”

终于等到了Docker for Mac。如之前期待的，体验真的很棒：

安装简单了，标准的Mac Application
VPN无障碍
原生的（osxfs）文件系统共享（其实还支持9p方式）
Docker Application管理 xhyve VM，更改配置后会自动重启
速度快，在使用体验上跟在Linux上面已经差别不大
可以与docker toolbox共存：Docker for Mac也会像Linux上面一样监听一个/var/run/docker.sock，这样客户端默认情况下就会走它的API；但也可以通过环境变量告诉docker CLI调用其他Docker Daemon的API（比如docker-machine管理的vm等）

... ➦

Jan 01, 0001

layout: post title: Software Engineering at Google date: 2017-02-13 19:36:09 tags: [Google]

Google的Fergus Henderson在Software Engineering at Google中介绍了Google的软件工程实践。

软件开发

源码仓库

单一源代码仓库，除了核心配置和安全相关代码，任何工程师都可以访问任何代码，并可以根据需要修改
所有开发都基于master分支，发布的时候才创建发布分枝
代码的每个子树都有owner，任何修改都需要owner批准

Blaze分布式构建系统

... ➦

Jan 01, 0001

layout: post title: AWS S3故障回顾和总结 date: 2017-03-03 22:27:50 tags: [aws]

S3故障回顾

2月28日，AWS工程师在排查Northern Virginia (US-EAST-1) Region的一个S3计费问题时，因敲错了一条playbook的参数而误删了大量的s3控制服务引发了4小时的故障。这个误操作影响了两个S3的核心系统：

... ➦

Jan 01, 0001

layout: post title: Gitlab故障回顾和总结 date: 2017-03-03 22:27:37 tags: []

Gitlab故障回顾

1月31日，Giblab在修复一个PostgreSQL数据同步问题（DB Replication lagged too far behind）时，误将生产环境的数据删除（本来是计划删除db1上的数据，结果发现在错误的db2上操作了）。进而寻求从备份数据恢复，结果发现没有实时备份：

... ➦

Jan 01, 0001

layout: post title: Kubernetes HA date: 2017-03-15 18:12:47 tags: [kubernetes]

Kubernetes从1.5开始，通过kops或者kube-up.sh部署的集群会自动部署一个高可用的系统，包括

etcd集群模式
apiserver负载均衡
controller manager、scheduler和cluster autoscaler自动选主（有且仅有一个运行实例）

如下图所示

... ➦

Jan 01, 0001

layout: post title: LinuxKit date: 2017-04-19 11:09:53 tags: [docker]

LinuxKit是Docker最新发布的一个用于为容器构建安全、便携、可移植操作系统的工具包。它根据用户编写的yaml（指定kernel和基于docker image的一些列服务）自动构建一个常见虚拟化平台或云平台的虚拟机镜像，并自动运行起来。主要特性包括

增强安全性
- 系统安全，基于MirageOS unikernel
- 紧跟最新kernel并精简不必要的模块
- Immutable，只读根文件系统，根文件系统只能在构建的时候生成
- 社区合作，比如Kernel Self Protection Project (KSPP)、Wireguard、Landlock、Mirage、oKernel、Clear Containers等
易用、可扩展
- 所有服务均可定制，且用户服务和系统服务都是基于docker image
- 构建过程基于docker
- 基于Infrakit方便部署生成的镜像

安装

git clone https://github.com/linuxkit/linuxkit $GOPATH/src/github.com/linuxkit/linuxkit
make && make install

原理

编写yaml

LinuxKit需要编写一个yaml文件，来配置所需要的服务。可选的配置包括

... ➦

Apache的Mesos和Google的Kubernetes 有什么区别 Jan 01, 0001

Kubernetes是一个开源项目，它把谷歌的集群管理工具引入到虚拟机和裸机场景中。它可以完美运行在现代的操作系统环境（比如CoreOS和Red Hat Atomic），并提供可以被你管控的轻量级的计算节点。Kubernetes使用Golang开发，具有轻量化、模块化、便携以及可扩展的特点。我们（Kubernetes开发团队）正在和一些不同的技术公司（包括维护着Mesos项目的MesoSphere）合作来把Kubernetes升级为一种与计算集群交互的标准方式。Kubernetes重新实现了Google在构建集群应用时积累的经验。这些概念包括如下内容：

... ➦

awesome quick start Jan 01, 0001

awesome是Linux平台出色的窗口管理器，具有速度快、界面简捷等优点。其安装也比较简单：

sudo apt-get install -y awesome awesome-extra gnome-settings-daemon nautilus
sudo apt-get install -y --no-install-recommends gnome-session
mkdir -p ~/.config/awesome

常用快捷键整理：

切换程序
切换到下一个程序：Mod4 + j
切换到上一个程序：Mod4 + k
切换到主窗口中的第一个程序：Mod4 + Ctrl + Return

切换tag
切换到上一个选择的tag：Mod4 + Esc
切换到某个指定的tag：Mod4 + 1-9
切换到前一个tag：Mod4 + Left
切换到下一个tag：Mod4 + Right

程序窗口状态修改
最大化/非最大化：Mod4 + m
浮动/平铺：Mod4 + Ctrl + Space
最小化：Mod4 + n
从最小化中恢复：Mod4 + Ctrl + n
关闭程序：Mod4 + Shift + C

程序窗口的转移和显示
转移到某个tag：Mod4 + Shift + 1-9（或在某个tag名上按Mod4+鼠标左键）
增加到某些tag：Mod4 + Shift + Ctrl + 1-9
转移到下一个窗口中的位置：Mod4 + Shift + j
转移到上一个窗口中的位置：Mod4 + Shift + k

布局修改
当前程序窗口宽度增加5％：Mod4 + Shift + h
当前程序窗口宽度减少5％：Mod4 + Shift + l
切换到下一种布局方式：Mod4 + Space
切换到上一种布局方式：Mod4 + Ctrl + Space

窗口管理
重启awesome：Mod4 + Ctrl + r
退出awesome：Mod4 + Shift + q
运行某个命令：Mod4 + r
打开awesome菜单：Mod4 + w

多显示器下的操作
切换到下一个屏幕：Mod4 + Ctrl + j
切换到上一个屏幕：Mod4 + Ctrl + k
将程序发送到下一个屏幕：Mod4 + o

awk examples Jan 01, 0001

precede each line by line number

awk '{print NR, $0}' filename

replace first field by line number

awk '{$1=NR; print}' filename

print field 1 and field 2

awk '{print $1,$2}' fielname

print last field

awk '{print $NF}' filename

print non empty lines

awk 'NF>0{print $0}' filename

print if more than 4 fields

awk 'NF>4{print $0}' filename

print matching lines (egrep)

awk '/test.*/{print $0}'  filename

print lines where first field matches

awk '$1 ~ /^print.*/{print $0}' filename

calcuting sum of field 2

awk 'BEGIN{sum=0}{sum+=$2}END{print sum}' filename

for loop

awk '{sum=0; for(i=1;i<=NF;i++)sum+=$i; print sum}' filename

make arrays

awk '{n = split($0, array); print array[1], array[3]} ' filename

reverse a file

awk '{x[NR]=$0} END{for(i=NR;i>0;i--)print x[i]}' filename

Associative Arrays

awk '{amount[$1]=$2} END{for(name in amount) print name, amount[name]}' filename

bigdata Jan 01, 0001

Awesome Big Data

A curated list of awesome big data frameworks, resources and other awesomeness. Inspired by awesome-php, awesome-python, awesome-ruby, hadoopecosystemtable & big-data.

Your contributions are always welcome!

Awesome Big Data
Other Awesome Lists

Frameworks

Apache Hadoop - framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system).

Distributed Programming

AddThis Hydra - distributed data processing and storage system originally developed at AddThis.
AMPLab SIMR - run Spark on Hadoop MapReduce v1.
Apache Crunch - a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.
Apache DataFu - collection of user-defined functions for Hadoop and Pig developed by LinkedIn.
Apache Flink - high-performance runtime, and automatic program optimization.
Apache Gora - framework for in-memory data model and persistence.
Apache Hama - BSP (Bulk Synchronous Parallel) computing framework.
Apache MapReduce - programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
Apache Pig - high level language to express data analysis programs for Hadoop.
Apache S4 - framework for stream processing, implementation of S4.
Apache Spark - framework for in-memory cluster computing.
Apache Spark Streaming - framework for stream processing, part of Spark.
Apache Storm - framework for stream processing by Twitter also on YARN.
Apache Tez - application framework for executing a complex DAG (directed acyclic graph) of tasks, built on YARN.
Apache Twill - abstraction over YARN that reduces the complexity of developing distributed applications.
Cascalog - data processing and querying library.
Cheetah - High Performance, Custom Data Warehouse on Top of MapReduce.
Concurrent Cascading - framework for data management/analytics on Hadoop.
Damballa Parkour - MapReduce library for Clojure.
Datasalt Pangool - alternative MapReduce paradigm.
DataTorrent StrAM - real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance.
Facebook Corona - Hadoop enhancement which removes single point of failure.
Facebook Peregrine - Map Reduce framework.
Facebook Scuba - distributed in-memory datastore.
Google Dataflow - create data pipelines to help themæingest, transform and analyze data.
Google MapReduce - map reduce framework.
Google MillWheel - fault tolerant stream processing framework.
JAQL - declarative programming language for working with structured, semi-structured and unstructured data.
Kite - is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
Metamarkers Druid - framework for real-time analysis of large datasets.
Netflix PigPen - map-reduce for Clojure whiche compiles to Apache Pig.
Nokia Disco - MapReduce framework developed by Nokia.
Pinterest Pinlater - asynchronous job execution system.
Pydoop - Python MapReduce and HDFS API for Hadoop.
Stratosphere - general purpose cluster computing framework.
Streamdrill - usefull for counting activities of event streams over different time windows and finding the most active one.
Twitter Scalding - Scala library for Map Reduce jobs, built on Cascading.
Twitter Summingbird - Streaming MapReduce with Scalding and Storm, by Twitter.
Twitter TSAR - TimeSeries AggregatoR by Twitter.

Distributed Filesystem

Apache HDFS - a way to store large files across multiple machines.
BeeGFS - formerly FhGFS, parallel distributed file system.
Ceph Filesystem - software storage platform designed.
Disco DDFS - distributed filesystem.
Facebook Haystack - object storage system.
Google Colossus - distributed filesystem (GFS2).
Google GFS - distributed filesystem.
Google Megastore - scalable, highly available storage.
GridGain - GGFS, Hadoop compliant in-memory file system.
Lustre file system - high-performance distributed filesystem.
Quantcast File System QFS - open-source distributed file system.
Red Hat GlusterFS - scale-out network-attached storage file system.
Tachyon - reliable file sharing at memory speed across cluster frameworks.

Document Data Model

Actian Versant - commercial object-oriented database management systems .
Crate Data - is an open source massively scalable data store. It requires zero administration.
Facebook Apollo - Facebook’s Paxos-like NoSQL database.
jumboDB - document oriented datastore over Hadoop.
LinkedIn Espresso - horizontally scalable document-oriented NoSQL data store.
MarkLogic - Schema-agnostic Enterprise NoSQL database technology.
MongoDB - Document-oriented database system.
RavenDB - A transactional, open-source Document Database.
RethinkDB - document database that supports queries like table joins and group by.

Key Map Data Model

Note: There is some term confusion in the industry, and two different things are called “Columnar Databases”. Some, listed here, are distributed, persistent databases built around the “key-map” data model: all data has a (possibly composite) key, with which a map of key-value pairs is associated. In some systems, multiple such value maps can be associated with a key, and these maps are referred to as “column families” (with value map keys being referred to as “columns”).

... ➦