Apache的Mesos和Google的Kubernetes 有什么区别 Jan 01, 0001

Kubernetes是一个开源项目，它把谷歌的集群管理工具引入到虚拟机和裸机场景中。它可以完美运行在现代的操作系统环境（比如CoreOS和Red Hat Atomic），并提供可以被你管控的轻量级的计算节点。Kubernetes使用Golang开发，具有轻量化、模块化、便携以及可扩展的特点。我们（Kubernetes开发团队）正在和一些不同的技术公司（包括维护着Mesos项目的MesoSphere）合作来把Kubernetes升级为一种与计算集群交互的标准方式。Kubernetes重新实现了Google在构建集群应用时积累的经验。这些概念包括如下内容：

... ➦

awesome quick start Jan 01, 0001

awesome是Linux平台出色的窗口管理器，具有速度快、界面简捷等优点。其安装也比较简单：

sudo apt-get install -y awesome awesome-extra gnome-settings-daemon nautilus
sudo apt-get install -y --no-install-recommends gnome-session
mkdir -p ~/.config/awesome

常用快捷键整理：

切换程序
切换到下一个程序：Mod4 + j
切换到上一个程序：Mod4 + k
切换到主窗口中的第一个程序：Mod4 + Ctrl + Return

切换tag
切换到上一个选择的tag：Mod4 + Esc
切换到某个指定的tag：Mod4 + 1-9
切换到前一个tag：Mod4 + Left
切换到下一个tag：Mod4 + Right

程序窗口状态修改
最大化/非最大化：Mod4 + m
浮动/平铺：Mod4 + Ctrl + Space
最小化：Mod4 + n
从最小化中恢复：Mod4 + Ctrl + n
关闭程序：Mod4 + Shift + C

程序窗口的转移和显示
转移到某个tag：Mod4 + Shift + 1-9（或在某个tag名上按Mod4+鼠标左键）
增加到某些tag：Mod4 + Shift + Ctrl + 1-9
转移到下一个窗口中的位置：Mod4 + Shift + j
转移到上一个窗口中的位置：Mod4 + Shift + k

布局修改
当前程序窗口宽度增加5％：Mod4 + Shift + h
当前程序窗口宽度减少5％：Mod4 + Shift + l
切换到下一种布局方式：Mod4 + Space
切换到上一种布局方式：Mod4 + Ctrl + Space

窗口管理
重启awesome：Mod4 + Ctrl + r
退出awesome：Mod4 + Shift + q
运行某个命令：Mod4 + r
打开awesome菜单：Mod4 + w

多显示器下的操作
切换到下一个屏幕：Mod4 + Ctrl + j
切换到上一个屏幕：Mod4 + Ctrl + k
将程序发送到下一个屏幕：Mod4 + o

awk examples Jan 01, 0001

precede each line by line number

awk '{print NR, $0}' filename

replace first field by line number

awk '{$1=NR; print}' filename

print field 1 and field 2

awk '{print $1,$2}' fielname

print last field

awk '{print $NF}' filename

print non empty lines

awk 'NF>0{print $0}' filename

print if more than 4 fields

awk 'NF>4{print $0}' filename

print matching lines (egrep)

awk '/test.*/{print $0}'  filename

print lines where first field matches

awk '$1 ~ /^print.*/{print $0}' filename

calcuting sum of field 2

awk 'BEGIN{sum=0}{sum+=$2}END{print sum}' filename

for loop

awk '{sum=0; for(i=1;i<=NF;i++)sum+=$i; print sum}' filename

make arrays

awk '{n = split($0, array); print array[1], array[3]} ' filename

reverse a file

awk '{x[NR]=$0} END{for(i=NR;i>0;i--)print x[i]}' filename

Associative Arrays

awk '{amount[$1]=$2} END{for(name in amount) print name, amount[name]}' filename

bigdata Jan 01, 0001

Awesome Big Data

A curated list of awesome big data frameworks, resources and other awesomeness. Inspired by awesome-php, awesome-python, awesome-ruby, hadoopecosystemtable & big-data.

Your contributions are always welcome!

Awesome Big Data
Other Awesome Lists

Frameworks

Apache Hadoop - framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system).

Distributed Programming

AddThis Hydra - distributed data processing and storage system originally developed at AddThis.
AMPLab SIMR - run Spark on Hadoop MapReduce v1.
Apache Crunch - a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.
Apache DataFu - collection of user-defined functions for Hadoop and Pig developed by LinkedIn.
Apache Flink - high-performance runtime, and automatic program optimization.
Apache Gora - framework for in-memory data model and persistence.
Apache Hama - BSP (Bulk Synchronous Parallel) computing framework.
Apache MapReduce - programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
Apache Pig - high level language to express data analysis programs for Hadoop.
Apache S4 - framework for stream processing, implementation of S4.
Apache Spark - framework for in-memory cluster computing.
Apache Spark Streaming - framework for stream processing, part of Spark.
Apache Storm - framework for stream processing by Twitter also on YARN.
Apache Tez - application framework for executing a complex DAG (directed acyclic graph) of tasks, built on YARN.
Apache Twill - abstraction over YARN that reduces the complexity of developing distributed applications.
Cascalog - data processing and querying library.
Cheetah - High Performance, Custom Data Warehouse on Top of MapReduce.
Concurrent Cascading - framework for data management/analytics on Hadoop.
Damballa Parkour - MapReduce library for Clojure.
Datasalt Pangool - alternative MapReduce paradigm.
DataTorrent StrAM - real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance.
Facebook Corona - Hadoop enhancement which removes single point of failure.
Facebook Peregrine - Map Reduce framework.
Facebook Scuba - distributed in-memory datastore.
Google Dataflow - create data pipelines to help themæingest, transform and analyze data.
Google MapReduce - map reduce framework.
Google MillWheel - fault tolerant stream processing framework.
JAQL - declarative programming language for working with structured, semi-structured and unstructured data.
Kite - is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
Metamarkers Druid - framework for real-time analysis of large datasets.
Netflix PigPen - map-reduce for Clojure whiche compiles to Apache Pig.
Nokia Disco - MapReduce framework developed by Nokia.
Pinterest Pinlater - asynchronous job execution system.
Pydoop - Python MapReduce and HDFS API for Hadoop.
Stratosphere - general purpose cluster computing framework.
Streamdrill - usefull for counting activities of event streams over different time windows and finding the most active one.
Twitter Scalding - Scala library for Map Reduce jobs, built on Cascading.
Twitter Summingbird - Streaming MapReduce with Scalding and Storm, by Twitter.
Twitter TSAR - TimeSeries AggregatoR by Twitter.

Distributed Filesystem

Apache HDFS - a way to store large files across multiple machines.
BeeGFS - formerly FhGFS, parallel distributed file system.
Ceph Filesystem - software storage platform designed.
Disco DDFS - distributed filesystem.
Facebook Haystack - object storage system.
Google Colossus - distributed filesystem (GFS2).
Google GFS - distributed filesystem.
Google Megastore - scalable, highly available storage.
GridGain - GGFS, Hadoop compliant in-memory file system.
Lustre file system - high-performance distributed filesystem.
Quantcast File System QFS - open-source distributed file system.
Red Hat GlusterFS - scale-out network-attached storage file system.
Tachyon - reliable file sharing at memory speed across cluster frameworks.

Document Data Model

Actian Versant - commercial object-oriented database management systems .
Crate Data - is an open source massively scalable data store. It requires zero administration.
Facebook Apollo - Facebook’s Paxos-like NoSQL database.
jumboDB - document oriented datastore over Hadoop.
LinkedIn Espresso - horizontally scalable document-oriented NoSQL data store.
MarkLogic - Schema-agnostic Enterprise NoSQL database technology.
MongoDB - Document-oriented database system.
RavenDB - A transactional, open-source Document Database.
RethinkDB - document database that supports queries like table joins and group by.

Key Map Data Model

Note: There is some term confusion in the industry, and two different things are called “Columnar Databases”. Some, listed here, are distributed, persistent databases built around the “key-map” data model: all data has a (possibly composite) key, with which a map of key-value pairs is associated. In some systems, multiple such value maps can be associated with a key, and these maps are referred to as “column families” (with value map keys being referred to as “columns”).

... ➦

cannot change locale Jan 01, 0001

运行locale命令
LANG=
LANGUAGE=
LC_CTYPE=“POSIX”
LC_NUMERIC=“POSIX”
LC_TIME=“POSIX”
LC_COLLATE=“POSIX”
LC_MONETARY=“POSIX”
LC_MESSAGES=“POSIX”
LC_PAPER=“POSIX”
LC_NAME=“POSIX”
LC_ADDRESS=“POSIX”
LC_TELEPHONE=“POSIX”
LC_MEASUREMENT=“POSIX”
LC_IDENTIFICATION=“POSIX”
LC_ALL=

修改profile

vi /etc/profile

添加如下内容

export LC_ALL=en_US.UTF-8

source /etc/profile

得到错误 setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory
运行 dpkg-reconfigure locales

得到错误

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = “en_US.UTF-8”,
        LANG = “en_US.UTF-8”
    are supported and installed on your system.
perl: warning: Falling back to the standard locale (“C”).
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

... ➦

Deploy a Mesos Cluster Using Docker Jan 01, 0001

his tutorial will show you how to bring up a single node Mesos cluster all provisioned out using Docker containers (a future post will show how to easily scale this out to multi nodes or see the update on the bottom). This means that you can startup an entire cluster with 7 commands! Nothing to install except for starting out with a working Docker server.

This will startup 4 containers:

ZooKeeper
Meso Master
Marathon
Mesos Slave Container

As mentioned the only prerequisite is to have a working Docker server. This means you can bring up a local Vagrant box with Docker installed, use Boot2Docker, use CoreOS, instance on AWS, or however you like to get a Docker server.

... ➦

Dive in Linux capabilites Jan 01, 0001

Introduction

Capabilities in Linux are flags that tell the kernel what the application is allowed to do, If you have no additional security mechanism in place, the Linux root user has all capabilities assigned to it. As capabilities are a way for running processes with some privileges, without having the need to grant them root privileges, it is important to understand that they exist.

Consider the ping utility. It is marked setuid root on some distributions, because the utility requires the (cap)ability to send raw packets. This capability is known as CAP_NET_RAW. However, thanks to capabilities, you can now mark the ping application with this capability and drop the setuid from the file. As a result, the application does not run with full root privileges anymore, but with the restricted privileges of the user plus one capability, namely the CAP_NET_RAW.

... ➦

Docker Jan 01, 0001

简介

Docker 是 dotCloud 最近几个月刚宣布的开源引擎，旨在提供一种应用程序的自动化部署解决方案，简单的说就是，在 Linux 系统上迅速创建一个容器（类似虚拟机）并在容器上部署和运行应用程序，并通过配置文件可以轻松实现应用程序的自动化安装、部署和升级，非常方便。因为使用了容器，所以可以很方便的把生产环境和开发环境分开，互不影响，这是 docker 最普遍的一个玩法。更多的玩法还有大规模 web 应用、数据库部署、持续部署、集群、测试环境、面向服务的云计算、虚拟桌面 VDI 等等。

... ➦

Docker acquires SDN startup SocketPlane Jan 01, 0001

At Socketplane we started out as four guys with a collectively strong belief in open source and open communities. We aligned around a shared vision that we wanted to be a critical part of Docker’s once in a decade disruption. Now that we are part of the Docker team, we couldn’t be happier.

We never looked to hedge our bets, our success was and obviously still is tied to the success of Docker. While there are many reasons that we decided to join the team, first and foremost Docker is unlike any other projects we have worked on in the past; the focus on user experience and simplicity is unmatched. Our early work with Docker during the open network design sprints gave us clear indications that the Docker maintainers were interested in being good open source stewards for the networking community in a project with an already staggering community of users and contributors. We also saw a genuine desire from Docker leadership to do right by both, individual contributors and the ecosystem. That made it all the more easy to jump in head first.

... ➦

docker in tencent Jan 01, 0001

腾讯内部对Docker有着广泛的使用，其基于Yarn的代号为Gaia的调度平台可以同时兼容Docker和非Docker类型的应用，并提供高并发任务调度和资源管理，它具有高度可伸缩性和可靠性，能够支持MR等离线业务。

... ➦