TF Learn : 基于Scikit-learn和TensorFlow的深度學(xué)習(xí)利器
原創(chuàng)【51CTO.com原創(chuàng)稿件】了解國外數(shù)據(jù)科學(xué)市場的人都知道,2017年海外數(shù)據(jù)科學(xué)最常用的三項(xiàng)技術(shù)是 Spark ,Python 和 MongoDB 。說到 Python ,做大數(shù)據(jù)的人都不會對 Scikit-learn 和 Pandas 感到陌生。
Scikit-learn 是最常用的 Python 機(jī)器學(xué)習(xí)框架,在各大互聯(lián)網(wǎng)公司做算法的工程師在實(shí)現(xiàn)單機(jī)版本的算法的時(shí)候或多或少都會用到 Scikit-learn 。TensorFlow 就更是大名鼎鼎,做深度學(xué)習(xí)的人都不可能不知道 TensorFlow。
下面我們先來看一段樣例,這段樣例是傳統(tǒng)的機(jī)器學(xué)習(xí)算法邏輯回歸的實(shí)現(xiàn):
可以看到,樣例中僅僅使用了 3 行代碼就完成了邏輯回歸的主要功能。下面我們來看一下如果用 TensorFlow 來實(shí)現(xiàn)同樣的代碼,需要多少行?下面的代碼來自 GitHub :
- '''
- A logistic regression learning algorithm example using TensorFlow library.
- This example is using the MNIST database of handwritten digits
- (http://yann.lecun.com/exdb/mnist/)
- Author: Aymeric Damien
- Project: https://github.com/aymericdamien/TensorFlow-Examples/
- '''
- from __future__ import print_function
- import tensorflow as tf
- # Import MNIST data
- from tensorflow.examples.tutorials.mnist import input_data
- mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
- # Parameters
- learning_rate = 0.01
- training_epochs = 25
- batch_size = 100
- display_step = 1
- # tf Graph Input
- x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
- y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes
- # Set model weights
- W = tf.Variable(tf.zeros([784, 10]))
- b = tf.Variable(tf.zeros([10]))
- # Construct model
- pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
- # Minimize error using cross entropy
- cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
- # Gradient Descent
- optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
- # Initialize the variables (i.e. assign their default value)
- init = tf.global_variables_initializer()
- # Start training
- with tf.Session() as sess:
- # Run the initializer
- sess.run(init)
- # Training cycle
- for epoch in range(training_epochs):
- avg_cost = 0.
- total_batch = int(mnist.train.num_examples/batch_size)
- # Loop over all batches
- for i in range(total_batch):
- batch_xs, batch_ys = mnist.train.next_batch(batch_size)
- # Run optimization op (backprop) and cost op (to get loss value)
- _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs,
- y: batch_ys})
- # Compute average loss
- avg_cost += c / total_batch
- # Display logs per epoch step
- if (epoch+1) % display_step == 0:
- print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))
- print("Optimization Finished!")
- # Test model
- correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
- # Calculate accuracy
- accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
- print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
一個(gè)相對來說比較簡單的機(jī)器學(xué)習(xí)算法,用 Tensorflow 來實(shí)現(xiàn)卻花費(fèi)了大量的篇幅。然而 Scikit-learn 本身沒有 Tensorflow 那樣豐富的深度學(xué)習(xí)的功能。有沒有什么辦法,能夠在保證 Scikit-learn 的簡單易用性的前提下,能夠讓 Scikit-learn 像 Tensorflow 那樣支持深度學(xué)習(xí)呢?答案是有的,那就是 Scikit-Flow 開源項(xiàng)目。該項(xiàng)目后來被集成到了 Tensorflow 項(xiàng)目里,變成了現(xiàn)在的 TF Learn 模塊。
我們來看一個(gè) TF Learn 實(shí)現(xiàn)線性回歸的樣例:
- """ Linear Regression Example """
- from __future__ import absolute_import, division, print_function
- import tflearn
- # Regression data
- X = [3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,7.042,10.791,5.313,7.997,5.654,9.27,3.1]
- Y = [1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,2.827,3.465,1.65,2.904,2.42,2.94,1.3]
- # Linear Regression graph
- input_ = tflearn.input_data(shape=[None])
- linear = tflearn.single_unit(input_)
- regression = tflearn.regression(linear, optimizer='sgd', loss='mean_square',
- metric='R2', learning_rate=0.01)
- m = tflearn.DNN(regression)
- m.fit(X, Y, n_epoch=1000, show_metric=True, snapshot_epoch=False)
- print("\nRegression result:")
- print("Y = " + str(m.get_weights(linear.W)) +
- "*X + " + str(m.get_weights(linear.b)))
- print("\nTest prediction for x = 3.2, 3.3, 3.4:")
- print(m.predict([3.2, 3.3, 3.4]))
我們可以看到,TF Learn 繼承了 Scikit-Learn 的簡潔編程風(fēng)格,在處理傳統(tǒng)的機(jī)器學(xué)習(xí)方法的時(shí)候非常的方便。下面我們看一段 TF Learn 實(shí)現(xiàn) CNN (MNIST數(shù)據(jù)集)的樣例:
- """ Convolutional Neural Network for MNIST dataset classification task.
- References:
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based
- learning applied to document recognition." Proceedings of the IEEE,
- 86(11):2278-2324, November 1998.
- Links:
- [MNIST Dataset] http://yann.lecun.com/exdb/mnist/
- """
- from __future__ import division, print_function, absolute_import
- import tflearn
- from tflearn.layers.core import input_data, dropout, fully_connected
- from tflearn.layers.conv import conv_2d, max_pool_2d
- from tflearn.layers.normalization import local_response_normalization
- from tflearn.layers.estimator import regression
- # Data loading and preprocessing
- import tflearn.datasets.mnist as mnist
- X, Y, testX, testY = mnist.load_data(one_hot=True)
- X = X.reshape([-1, 28, 28, 1])
- testX = testX.reshape([-1, 28, 28, 1])
- # Building convolutional network
- network = input_data(shape=[None, 28, 28, 1], name='input')
- network = conv_2d(network, 32, 3, activation='relu', regularizer="L2")
- network = max_pool_2d(network, 2)
- network = local_response_normalization(network)
- network = conv_2d(network, 64, 3, activation='relu', regularizer="L2")
- network = max_pool_2d(network, 2)
- network = local_response_normalization(network)
- network = fully_connected(network, 128, activation='tanh')
- network = dropout(network, 0.8)
- network = fully_connected(network, 256, activation='tanh')
- network = dropout(network, 0.8)
- network = fully_connected(network, 10, activation='softmax')
- network = regression(network, optimizer='adam', learning_rate=0.01,
- loss='categorical_crossentropy', name='target')
- # Training
- model = tflearn.DNN(network, tensorboard_verbose=0)
- model.fit({'input': X}, {'target': Y}, n_epoch=20,
- validation_set=({'input': testX}, {'target': testY}),
- snapshot_step=100, show_metric=True, run_id='convnet_mnist')
可以看到,基于 TF Learn 的深度學(xué)習(xí)代碼也是非常的簡潔。
TF Learn 是 TensorFlow 的高層次類 Scikit-Learn 封裝,提供了原生版 TensorFlow 和 Scikit-Learn 之外的又一種選擇。對于熟悉了 Scikit-Learn 和厭倦了 TensorFlow 冗長代碼的用戶來說,不啻為一種福音,也值得機(jī)器學(xué)習(xí)和數(shù)據(jù)挖掘的從業(yè)者認(rèn)真學(xué)習(xí)和掌握。
汪昊,恒昌利通大數(shù)據(jù)部負(fù)責(zé)人/資深架構(gòu)師,美國猶他大學(xué)本科/碩士,對外經(jīng)貿(mào)大學(xué)在職MBA。曾在百度,新浪,網(wǎng)易,豆瓣等公司有多年的研發(fā)和技術(shù)管理經(jīng)驗(yàn),擅長機(jī)器學(xué)習(xí),大數(shù)據(jù),推薦系統(tǒng),社交網(wǎng)絡(luò)分析等技術(shù)。在 TVCG 和 ASONAM 等國際會議和期刊發(fā)表論文 8 篇。本科畢業(yè)論文獲國際會議 IEEE SMI 2008 ***論文獎。
【51CTO原創(chuàng)稿件,合作站點(diǎn)轉(zhuǎn)載請注明原文作者和出處為51CTO.com】