道力之限,要靠愿力突破。
人生是一个自我实现的预言。

0%

1
import pandas as pd
2
import numpy as np
3
np.random.seed(0)
4
import matplotlib.pyplot as plt
5
df = pd.read_csv('/winequality-red.csv') # Load the data
6
from sklearn.model_selection import train_test_split
7
from sklearn import preprocessing
8
from sklearn.ensemble import RandomForestRegressor
9
# The target variable is 'quality'.
10
Y = df['quality']
11
X =  df[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar','chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density','pH', 'sulphates', 'alcohol']]
12
# Split the data into train and test data:
13
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2)
14
# Build the model with the random forest regression algorithm:
15
model = RandomForestRegressor(max_depth=6, random_state=0, n_estimators=10)
16
model.fit(X_train, Y_train)
Read more »

Notes

  • Get the categorical vars you want to get dummy with.
  • use pd.get_dummies to convert one categorical variables to several dummys
1
catagorical_vars = ['peak','business_line','gender','age_level']
2
continuous_vars = set.difference(set(all_vars),set(catagorical_vars))
3
4
cate_list = []
5
for i in catagorical_vars:
6
    print(catagorical_vars)
7
    fe_dummy = pd.get_dummies(X[i])
8
    cate_list.append(fe_dummy)
9
    
10
dummy_all = pd.concat(cate_list, axis = 1)
11
dummy_all.head()

Key Points

  • Convert your timeseries data to the the matrix like a moving window, which has the exact number of inputs(n_steps_in) and outpus(n_steps_out) you defined.
  • After trained model, here I defined to calculate mse for each out steps, and obviously, the more out steps I want to predict ,the large mse it is.
Read more »

Description

Propensity Score Matching is a Sample Matching Method, it can effectively eliminate the effecting facotors between different groups and avoid the selecttion bias between two sample groups when we can not conduct random sampling.

Algorithm

Reuce X to one dimension through dimension reduction method, and get the Propensity Socre of every sample through Logistic Regression, then we match the samples, the most used way is Newares Neighbor matching, NNM.

  买了新的Mac,但是Hexo博客是在的公司的电脑上部署的,所以研究了一下怎么把原来部署的博客同步新电脑上,实现两台设备都可以post博客的目的。看了很多文章也试了很多方法,最后是照着这篇文章step by step的部署好的,需要的朋友可以参考它~

利用Hexo在多台电脑上提交和更新github pages博客

Sample Size Calculation

1
import scipy.stats as stats
2
3
def sample_size_calculation(mu, sigma, MDE, alpha=0.05, beta=0.2):
4
    return 2 * (sigma**2) * ((stats.norm.ppf(1-alpha/2) + stats.norm.ppf(1-beta))**2) / ((mu * MDE)**2)
5

Minimum Defective Effect

1
from scipy.stats import norm
2
3
sample_size = 1000
4
alpha = 0.05
5
z = norm.isf(alpha / 2)
6
estimated_variance = ds.y.var()
7
detectable_effect_size = z * np.sqrt(2 * estimated_variance / sample_size)

Jordan B. Peterson is a phycological professor from University of Toronto. His book 12 Rules For Life is one of my favoirate book.

  • Tell the truth.
  • Do not do things that you hate.
  • Act so that you can tell the truth about how you act.
  • Pursue what is meaningful, not what is expedient.
  • If you have to choose, be the one who does things, instead of the one who is seen to do things.
  • Pay attention.
  • Assume that the person you are listening to might know something you need to know.
  • Listen to them hard enough so that they will share it with you.
  • Plan and work diligently to maintain the romance in your relationships.
  • Be careful who you share good news with.
  • Be careful who you share bad news with.
    Read more »