Denoise with Auto Encoder of H2O in Python for MNIST

We talked about auto-encoder here and here with R (https://charleshsliao.wordpress.com/2017/04/14/identify-arguments-of-h2o-deep-learning-model-with-tuned-auto-encoder-in-r-with-mnist/). We also talked about the three functions of auto encoder above. This is a pretty standard example used for benchmarking anomaly detection models. We use Python3 and H2O framework to build auto-encoder. More details can be found in Sebastian Raschka's book: https://www.goodreads.com/book/show/25545994-python-machine-learning?ac=1&from_search=true

Advertisements

Preprocess: LDA and Kernel PCA in Python

Principal component analysis (PCA) is an unsupervised linear transformation technique that is widely used across different fields, most prominently for dimensionality reduction. We talked about it here: https://charleshsliao.wordpress.com/2017/05/28/preprocess-pca-application-in-python/ We use the data from sklearn library, and the IDE is Python3. Most of the code comes from Sebastian Raschka's book: https://www.goodreads.com/book/show/25545994-python-machine-learning?ac=1&from_search=true

Movie Recommender -Affinity Analysis of Apriori in Python

"Affinity analysis can be applied to many processes that do not use transactions in this sense: Fraud detection Customer segmentation Software optimization Product recommendations. The classic algorithm for affinity analysis is called the Apriori algorithm. " More details can be found in Robert Layton's book here: https://www.goodreads.com/book/show/26019855-learning-data-mining-with-python?from_search=true We explored similar method of "Market Basket" here:… Continue reading Movie Recommender -Affinity Analysis of Apriori in Python

Quick Cross Validation and Grid Search of Parameters in Python

Cross Validation is a way to lift overfitting during training model, and we also applied Grid Search method in both python and R: https://charleshsliao.wordpress.com/2017/05/20/logistic-regression-in-python-to-tune-parameter-c/ https://charleshsliao.wordpress.com/2017/04/24/cnndnn-of-keras-in-r-backend-tensorflow-for-mnist/ We will focus on how to use both the methods to identify the best parameters, model and score without overfitting. We use the data from sklearn library, and the IDE… Continue reading Quick Cross Validation and Grid Search of Parameters in Python

Clustering Algorithms Evaluation in Python

Sometimes we conduct clustering to match the clusters with the true labels of the dataset. Apparently this is one method to evaluate clustering results. We can also use other methods to complete the task with or without ground truth of the data. We use the data from sklearn library, and the IDE is sublime text3.… Continue reading Clustering Algorithms Evaluation in Python

How Certain is This Classifier? Uncertainty Estimates in Python

We are not only interested in which class a classifier predicts for a certain test point, but also how certain it is that this is the right class.There are two different functions revealing the certainty of the classifier. We use the data from sklearn library, and the IDE is sublime text3. Most of the code… Continue reading How Certain is This Classifier? Uncertainty Estimates in Python

A Business Plan Draft in Tourism Industry-File/Server and Data Lake

1.1 File/Server Architecture We are thinking about Apache Tomcat and Apache HTTP. The Apache HTTP - is a powerful, flexible, HTTP/1.1 compliant web server(now with Apache HTTP 2.4) - implements the latest protocols, including HTTP/1.1 (RFC2616) - is highly configurable and extensible with third-party modules - can be customized by writing 'modules' using the Apache module… Continue reading A Business Plan Draft in Tourism Industry-File/Server and Data Lake