Jan 01, 0001

layout: “post” title: “Hello world to Docker Mac” date: “2016-04-15 16:34”

终于等到了Docker for Mac。如之前期待的,体验真的很棒:

  • 安装简单了,标准的Mac Application
  • VPN无障碍
  • 原生的(osxfs)文件系统共享(其实还支持9p方式)
  • Docker Application管理 xhyve VM,更改配置后会自动重启
  • 速度快,在使用体验上跟在Linux上面已经差别不大
  • 可以与docker toolbox共存:Docker for Mac也会像Linux上面一样监听一个/var/run/docker.sock,这样客户端默认情况下就会走它的API;但也可以通过环境变量告诉docker CLI调用其他Docker Daemon的API(比如docker-machine管理的vm等)

...
Jan 01, 0001

layout: post title: Software Engineering at Google date: 2017-02-13 19:36:09 tags: [Google]

Google的Fergus Henderson在Software Engineering at Google中介绍了Google的软件工程实践。

软件开发

源码仓库

  • 单一源代码仓库,除了核心配置和安全相关代码,任何工程师都可以访问任何代码,并可以根据需要修改
  • 所有开发都基于master分支,发布的时候才创建发布分枝
  • 代码的每个子树都有owner,任何修改都需要owner批准

Blaze分布式构建系统

...
Jan 01, 0001

layout: post title: AWS S3故障回顾和总结 date: 2017-03-03 22:27:50 tags: [aws]

S3故障回顾

2月28日,AWS工程师在排查Northern Virginia (US-EAST-1) Region的一个S3计费问题时,因敲错了一条playbook的参数而误删了大量的s3控制服务引发了4小时的故障。这个误操作影响了两个S3的核心系统:

...
Jan 01, 0001

layout: post title: Gitlab故障回顾和总结 date: 2017-03-03 22:27:37 tags: []

Gitlab故障回顾

1月31日,Giblab在修复一个PostgreSQL数据同步问题(DB Replication lagged too far behind)时,误将生产环境的数据删除(本来是计划删除db1上的数据,结果发现在错误的db2上操作了)。进而寻求从备份数据恢复,结果发现没有实时备份:

...
Jan 01, 0001

layout: post title: Kubernetes HA date: 2017-03-15 18:12:47 tags: [kubernetes]

Kubernetes从1.5开始,通过kops或者kube-up.sh部署的集群会自动部署一个高可用的系统,包括

  • etcd集群模式
  • apiserver负载均衡
  • controller manager、scheduler和cluster autoscaler自动选主(有且仅有一个运行实例)

如下图所示

...
Jan 01, 0001

layout: post title: LinuxKit date: 2017-04-19 11:09:53 tags: [docker]

LinuxKit是Docker最新发布的一个用于为容器构建安全、便携、可移植操作系统的工具包。它根据用户编写的yaml(指定kernel和基于docker image的一些列服务)自动构建一个常见虚拟化平台或云平台的虚拟机镜像,并自动运行起来。主要特性包括

  • 增强安全性
  • 易用、可扩展
    • 所有服务均可定制,且用户服务和系统服务都是基于docker image
    • 构建过程基于docker
    • 基于Infrakit方便部署生成的镜像

安装

git clone https://github.com/linuxkit/linuxkit $GOPATH/src/github.com/linuxkit/linuxkit
make && make install

原理

编写yaml

LinuxKit需要编写一个yaml文件,来配置所需要的服务。可选的配置包括

...
Apache的Mesos和Google的Kubernetes 有什么区别 Jan 01, 0001

Kubernetes是一个开源项目,它把谷歌的集群管理工具引入到虚拟机和裸机场景中。它可以完美运行在现代的操作系统环境(比如CoreOS和Red Hat Atomic),并提供可以被你管控的轻量级的计算节点。Kubernetes使用Golang开发,具有轻量化、模块化、便携以及可扩展的特点。我们(Kubernetes开发团队)正在和一些不同的技术公司(包括维护着Mesos项目的MesoSphere)合作来把Kubernetes升级为一种与计算集群交互的标准方式。Kubernetes重新实现了Google在构建集群应用时积累的经验。这些概念包括如下内容:

...
awesome quick start Jan 01, 0001

awesome是Linux平台出色的窗口管理器,具有速度快、界面简捷等优点。其安装也比较简单:

sudo apt-get install -y awesome awesome-extra gnome-settings-daemon nautilus
sudo apt-get install -y --no-install-recommends gnome-session
mkdir -p ~/.config/awesome

常用快捷键整理:

切换程序
切换到下一个程序:Mod4 + j
切换到上一个程序:Mod4 + k
切换到主窗口中的第一个程序:Mod4 + Ctrl + Return

切换tag
切换到上一个选择的tag:Mod4 + Esc
切换到某个指定的tag:Mod4 + 1-9
切换到前一个tag:Mod4 + Left
切换到下一个tag:Mod4 + Right

程序窗口状态修改
最大化/非最大化:Mod4 + m
浮动/平铺:Mod4 + Ctrl + Space
最小化:Mod4 + n
从最小化中恢复:Mod4 + Ctrl + n
关闭程序:Mod4 + Shift + C

程序窗口的转移和显示
转移到某个tag:Mod4 + Shift + 1-9(或在某个tag名上按Mod4+鼠标左键)
增加到某些tag:Mod4 + Shift + Ctrl + 1-9
转移到下一个窗口中的位置:Mod4 + Shift + j
转移到上一个窗口中的位置:Mod4 + Shift + k

布局修改
当前程序窗口宽度增加5%:Mod4 + Shift + h
当前程序窗口宽度减少5%:Mod4 + Shift + l
切换到下一种布局方式:Mod4 + Space
切换到上一种布局方式:Mod4 + Ctrl + Space

窗口管理
重启awesome:Mod4 + Ctrl + r
退出awesome:Mod4 + Shift + q
运行某个命令:Mod4 + r
打开awesome菜单:Mod4 + w

多显示器下的操作
切换到下一个屏幕:Mod4 + Ctrl + j
切换到上一个屏幕:Mod4 + Ctrl + k
将程序发送到下一个屏幕:Mod4 + o
awk examples Jan 01, 0001
  • precede each line by line number
awk '{print NR, $0}' filename
  • replace first field by line number
awk '{$1=NR; print}' filename
  • print field 1 and field 2
awk '{print $1,$2}' fielname
  • print last field
awk '{print $NF}' filename
  • print non empty lines
awk 'NF>0{print $0}' filename
  • print if more than 4 fields
awk 'NF>4{print $0}' filename
  • print matching lines (egrep)
awk '/test.*/{print $0}'  filename
  • print lines where first field matches
awk '$1 ~ /^print.*/{print $0}' filename
  • calcuting sum of field 2
awk 'BEGIN{sum=0}{sum+=$2}END{print sum}' filename
  • for loop
awk '{sum=0; for(i=1;i<=NF;i++)sum+=$i; print sum}' filename
  • make arrays
awk '{n = split($0, array); print array[1], array[3]} ' filename 
  • reverse a file
awk '{x[NR]=$0} END{for(i=NR;i>0;i--)print x[i]}' filename 
  • Associative Arrays
awk '{amount[$1]=$2} END{for(name in amount) print name, amount[name]}' filename
bigdata Jan 01, 0001

Awesome Big Data

A curated list of awesome big data frameworks, resources and other awesomeness. Inspired by awesome-php, awesome-python, awesome-ruby, hadoopecosystemtable & big-data.

Your contributions are always welcome!

Frameworks

  • Apache Hadoop - framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system).

Distributed Programming

  • AddThis Hydra - distributed data processing and storage system originally developed at AddThis.
  • AMPLab SIMR - run Spark on Hadoop MapReduce v1.
  • Apache Crunch - a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.
  • Apache DataFu - collection of user-defined functions for Hadoop and Pig developed by LinkedIn.
  • Apache Flink - high-performance runtime, and automatic program optimization.
  • Apache Gora - framework for in-memory data model and persistence.
  • Apache Hama - BSP (Bulk Synchronous Parallel) computing framework.
  • Apache MapReduce - programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
  • Apache Pig - high level language to express data analysis programs for Hadoop.
  • Apache S4 - framework for stream processing, implementation of S4.
  • Apache Spark - framework for in-memory cluster computing.
  • Apache Spark Streaming - framework for stream processing, part of Spark.
  • Apache Storm - framework for stream processing by Twitter also on YARN.
  • Apache Tez - application framework for executing a complex DAG (directed acyclic graph) of tasks, built on YARN.
  • Apache Twill - abstraction over YARN that reduces the complexity of developing distributed applications.
  • Cascalog - data processing and querying library.
  • Cheetah - High Performance, Custom Data Warehouse on Top of MapReduce.
  • Concurrent Cascading - framework for data management/analytics on Hadoop.
  • Damballa Parkour - MapReduce library for Clojure.
  • Datasalt Pangool - alternative MapReduce paradigm.
  • DataTorrent StrAM - real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance.
  • Facebook Corona - Hadoop enhancement which removes single point of failure.
  • Facebook Peregrine - Map Reduce framework.
  • Facebook Scuba - distributed in-memory datastore.
  • Google Dataflow - create data pipelines to help themæingest, transform and analyze data.
  • Google MapReduce - map reduce framework.
  • Google MillWheel - fault tolerant stream processing framework.
  • JAQL - declarative programming language for working with structured, semi-structured and unstructured data.
  • Kite - is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
  • Metamarkers Druid - framework for real-time analysis of large datasets.
  • Netflix PigPen - map-reduce for Clojure whiche compiles to Apache Pig.
  • Nokia Disco - MapReduce framework developed by Nokia.
  • Pinterest Pinlater - asynchronous job execution system.
  • Pydoop - Python MapReduce and HDFS API for Hadoop.
  • Stratosphere - general purpose cluster computing framework.
  • Streamdrill - usefull for counting activities of event streams over different time windows and finding the most active one.
  • Twitter Scalding - Scala library for Map Reduce jobs, built on Cascading.
  • Twitter Summingbird - Streaming MapReduce with Scalding and Storm, by Twitter.
  • Twitter TSAR - TimeSeries AggregatoR by Twitter.

Distributed Filesystem

Document Data Model

  • Actian Versant - commercial object-oriented database management systems .
  • Crate Data - is an open source massively scalable data store. It requires zero administration.
  • Facebook Apollo - Facebook’s Paxos-like NoSQL database.
  • jumboDB - document oriented datastore over Hadoop.
  • LinkedIn Espresso - horizontally scalable document-oriented NoSQL data store.
  • MarkLogic - Schema-agnostic Enterprise NoSQL database technology.
  • MongoDB - Document-oriented database system.
  • RavenDB - A transactional, open-source Document Database.
  • RethinkDB - document database that supports queries like table joins and group by.

Key Map Data Model

Note: There is some term confusion in the industry, and two different things are called “Columnar Databases”. Some, listed here, are distributed, persistent databases built around the “key-map” data model: all data has a (possibly composite) key, with which a map of key-value pairs is associated. In some systems, multiple such value maps can be associated with a key, and these maps are referred to as “column families” (with value map keys being referred to as “columns”).

...