데이터 사이언스 07 시계열 데이터 02-주가예측-prophet

게시: by Creative Commons Licence

1. 주가예측 - statsmodels

파이썬으로 배우는 알고리즘 트레이딩 - pandas_datareader모듈

파이썬으로 주식데이터 가져오기

Pandas 에서 주가 데이터 가져오기 - 안수찬 블로그

1) 모듈명 변경 및 설치

pandas-datareader 공식문서

  • 과거 pandas.io.data 모듈이 pandas_datareader.data로 바뀜
  • 모듈 설치
➜  pip install pandas-datareader

Collecting pandas-datareader
  Downloading pandas_datareader-0.5.0-py2.py3-none-any.whl (74kB)
    100% |████████████████████████████████| 81kB 606kB/s 
Installing collected packages: requests-ftp, requests-file, pandas-datareader
  Running setup.py install for requests-ftp ... done
Successfully installed pandas-datareader-0.5.0 requests-file-1.4.3 requests-ftp-0.3.1

2) 대한항공 주가

  • 종목 코드 : 003490
  • 2015년 1월 1일부터 현재까지의 주가
import warnings
warnings.filterwarnings("ignore")

import itertools
import pandas as pd
import numpy as np

import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

# 한글폰트 
import matplotlib.font_manager as fm
# 폰트 적용
font_location = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
font_name = fm.FontProperties(fname=font_location).get_name()

from matplotlib import rc
rc('font', family=font_name)
import pandas_datareader.data as web
from datetime import datetime
# 대한항공 2015. 1. 1.부터 현재까지의 주가 

start = datetime( 2015, 1, 1 )
end = datetime.now()

# yahoo 파이낸스 
KA = web.DataReader('003490.KS', 'yahoo', start, end)
KA.head()
Open High Low Close Adj Close Volume
Date
2015-01-02 43374.101563 44282.398438 40966.898438 41012.300781 41012.300781 1062595
2015-01-05 41330.300781 42329.500000 40194.800781 41966.101563 41966.101563 912246
2015-01-06 42374.898438 43192.398438 41466.500000 41466.500000 41466.500000 948055
2015-01-07 39059.398438 40558.199219 37197.199219 39513.601563 39513.601563 3256086
2015-01-08 39604.398438 40240.199219 38605.199219 39331.898438 39331.898438 549048
KA['Close'].plot(
    style='--',
    figsize=(12, 6)
)
pd.rolling_mean( KA['Close'], 7).plot( lw=2 )
plt.title('대한항공 종가 시세')
plt.legend( ['종가시세', '이동평균선(7일)' ] )
plt.show()

png

3) 삼성전자 주가

  • 종목 코드: 005930
start = datetime( 2010, 1, 1 )
end = datetime.now()

SamSung = web.DataReader('005930.KS', 'yahoo', start, end)
SamSung['Close'].plot(
    style='--',
    figsize=(12, 6)
)
pd.rolling_mean( KA['Close'], 7).plot( lw=2 )
plt.title('삼성전자')
plt.legend( ['종가시세', '이동평균선(7일)' ] )
plt.show()

png

SamSung.head(3)
Open High Low Close Adj Close Volume
Date
2010-01-04 803000.0 809000.0 800000.0 809000.0 740499.0625 239016
2010-01-05 826000.0 829000.0 815000.0 822000.0 752398.2500 558517
2010-01-06 829000.0 841000.0 826000.0 841000.0 769789.5000 458977
len(SamSung)
1991
from pylab import rcParams
rcParams['figure.figsize'] = 12, 8

# 삼성전자 주식 종가에 대한 분해
y = SamSung['Close'][ 1600 : ] 

decomposition = sm.tsa.seasonal_decompose(  y, freq=12 )

fig = decomposition.plot()
plt.show()

png

# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)

# Generate all different combinations of p, q and d triplets
pdq = list( itertools.product(p, d, q))

# Generate all different combinations of seasonal p, q and d triplets
seasonal_pdq = [ (x[0], x[1], x[2], 12)  for x in pdq ]

print('Example of parameter combinations for Seasonal ARIMA ...')
print('SARIMAX:  {} x {}'.format(pdq[1], seasonal_pdq[1]) )
print('SARIMAX:  {} x {}'.format(pdq[1], seasonal_pdq[2]) )
print('SARIMAX:  {} x {}'.format(pdq[2], seasonal_pdq[3]) )
print('SARIMAX:  {} x {}'.format(pdq[2], seasonal_pdq[4]) )
Example of parameter combinations for Seasonal ARIMA ...
SARIMAX:  (0, 0, 1) x (0, 0, 1, 12)
SARIMAX:  (0, 0, 1) x (0, 1, 0, 12)
SARIMAX:  (0, 1, 0) x (0, 1, 1, 12)
SARIMAX:  (0, 1, 0) x (1, 0, 0, 12)
warnings.filterwarnings("ignore")

y = SamSung['Close']

select_candi = 10000000
param_candi = ( 0, 0, 0 )
param_seasonal_candi = ( 0, 0, 0)

count=0
end_count = len(pdq)

for param in pdq:   
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX( 
                                            y,
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False
                                           )
            results = mod.fit()
            count += 1
            if count <= 5:
                print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
            
            if results.aic < select_candi:
                select_candi = results.aic
                param_candi = param
                param_seasonal_candi = param_seasonal
        except:
            continue
            
print(param_candi, param_seasonal_candi, select_candi)            
ARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:60253.297878071855
ARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:49644.270908598235
ARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:49928.79116493459
ARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:49903.76087716084
ARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:49669.749916549496
(1, 1, 1) (0, 0, 1, 12) 45391.1132768003
mod = sm.tsa.statespace.SARIMAX(
    y,
    order=(1, 1, 1),
    seasonal_order=(0, 0, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False
)

results = mod.fit()

print( results.summary().tables[1] )
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.6241      0.182     -3.431      0.001      -0.981      -0.268
ma.L1          0.6733      0.172      3.925      0.000       0.337       1.010
ma.S.L12       0.0194      0.019      0.997      0.319      -0.019       0.058
sigma2      5.519e+08   5.22e-11   1.06e+19      0.000    5.52e+08    5.52e+08
==============================================================================
results.plot_diagnostics( figsize=(12, 10) )
plt.show()

png

pred = results.get_prediction(
    start=pd.to_datetime('2015-1-2'),
    dynamic=False
)

pred_ci = pred.conf_int()
# 관측 데이터  1973년 부터 끝까지 
ax = y[ '2000' : ].plot( label='observed', figsize=(12, 8) )

# 예측
pred.predicted_mean.plot(
    ax=ax,
    label='One-step ahead Forecast',
    alpha=.7
)

ax.fill_between(
    pred_ci.index,
    pred_ci.iloc[ : , 0 ],
    pred_ci.iloc[ : , 1 ],
    color='k',
    alpha=.2
)

ax.fill_betweenx(
    ax.get_ylim(),
    pd.to_datetime('2015-01-01'),
    y.index[-1],
    alpha=.3,
)

ax.set_xlabel('Date')
ax.set_ylabel('best')
plt.legend()

plt.show()

png

y = SamSung['Close'].resample('MS').mean()
warnings.filterwarnings("ignore")

select_candi = 10000000
param_candi = ( 0, 0, 0 )
param_seasonal_candi = ( 0, 0, 0)

count=0
end_count = len(pdq)

for param in pdq:   
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX( 
                                            y,
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False
                                           )
            results = mod.fit()
            count += 1
            if count <= 5:
                print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
            
            if results.aic < select_candi:
                select_candi = results.aic
                param_candi = param
                param_seasonal_candi = param_seasonal
        except:
            continue
            
print(param_candi, param_seasonal_candi, select_candi)        
ARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:2588.3600182104815
ARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:2077.9442515545934
ARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:2411.5750802284015
ARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:2388.3744838038806
ARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:2100.2859990842962
(0, 1, 1) (1, 1, 1, 12) 1798.848144371688
mod = sm.tsa.statespace.SARIMAX(
    y,
    order=( 0, 1, 1 ),
    seasonal_order=( 1, 1, 1, 12 ),
    enforce_stationarity=False,
    enforce_invertibility=False
)

results = mod.fit()

print( results.summary().tables[1] )
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ma.L1          0.2222      0.243      0.916      0.360      -0.253       0.698
ar.S.L12      -0.8107      0.208     -3.888      0.000      -1.219      -0.402
ma.S.L12      -0.0022      0.278     -0.008      0.994      -0.546       0.542
sigma2      1.145e+10   7.04e-12   1.63e+21      0.000    1.15e+10    1.15e+10
==============================================================================
results.plot_diagnostics( figsize=(12, 10) )
plt.show()

png

pred = results.get_prediction(
    start=pd.to_datetime('2015-01-01'),
    dynamic=False
)

pred_ci = pred.conf_int()
# 관측 데이터  2003년 부터 끝까지 
ax = y[ '2003' : ].plot( label='observed', figsize=(12, 8) )

# 예측
pred.predicted_mean.plot(
    ax=ax,
    label='One-step ahead Forecast',
    alpha=.7
)

ax.fill_between(
    pred_ci.index,
    pred_ci.iloc[ : , 0 ],
    pred_ci.iloc[ : , 1 ],
    color='k',
    alpha=.2
)

ax.fill_betweenx(
    ax.get_ylim(),
    pd.to_datetime('2015-01-01'),
    y.index[-1],
    alpha=.3,
)

ax.set_xlabel('Date')
plt.legend()

plt.show()

png

2. 주가예측 - Prophet

https://facebook.github.io/prophet/

https://github.com/facebook/prophet

1) install

  • Ubuntu16.04LTS, Python 3.6.2 환경
  • 패키지로 설치하는 것이 편하고 안전하다.
➜  pip install fbprophet

Collecting fbprophet
  Downloading fbprophet-0.2.1.tar.gz
Collecting pystan>=2.14 (from fbprophet)
  Downloading pystan-2.17.1.0-cp36-cp36m-manylinux1_x86_64.whl (68.1MB)
  100% |████████████████████████████████| 68.1MB 
  Downloading Cython-0.27.3-cp36-cp36m-manylinux1_x86_64.whl (3.1MB)
  100% |████████████████████████████████| 3.1MB 236kB/s 
Installing collected packages: Cython, pystan, fbprophet
  Running setup.py install for fbprophet ... \

2) 기아자동차 주식

import warnings
warnings.filterwarnings("ignore")

import itertools
import pandas as pd
import numpy as np

import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline
import pandas_datareader.data as web
from datetime import datetime

# 한글폰트 
import matplotlib.font_manager as fm
# 폰트 적용
font_location = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
font_name = fm.FontProperties(fname=font_location).get_name()

from matplotlib import rc
rc('font', family=font_name)


from fbprophet import Prophet
start = datetime( 1990, 1, 1 )
end = datetime( 2017, 6, 30 )

# 기아자동차 주식
KIA = web.DataReader('000270.KS', 'yahoo', start, end )
KIA.head()
Open High Low Close Adj Close Volume
Date
2000-01-04 7404.520020 7665.240234 7300.229980 7665.240234 6111.007324 636300.0
2000-01-05 7404.520020 7404.520020 7248.089844 7248.089844 5778.440918 686100.0
2000-01-06 7331.520020 7519.240234 6935.220215 6935.220215 5529.009277 379000.0
2000-01-07 6987.359863 7143.799805 6778.790039 6778.790039 5404.296875 701400.0
2000-01-10 6841.359863 7102.080078 6810.069824 7091.649902 5653.720703 1076700.0
KIA['Close'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f1f1d6a3160>

png

# 2016-12-31까지 

KIA_trunc = KIA[ : '2016-12-31' ]
KIA_trunc.head(3)
Open High Low Close Adj Close Volume
Date
2000-01-04 7404.52002 7665.240234 7300.229980 7665.240234 6111.007324 636300.0
2000-01-05 7404.52002 7404.520020 7248.089844 7248.089844 5778.440918 686100.0
2000-01-06 7331.52002 7519.240234 6935.220215 6935.220215 5529.009277 379000.0
df = pd.DataFrame(
    {
        'ds': KIA_trunc.index,
        'y'  : KIA_trunc['Close']
    }    
)

df.reset_index( inplace=True )
del df['Date']
df.head(3)
ds y
0 2000-01-04 7665.240234
1 2000-01-05 7248.089844
2 2000-01-06 6935.220215
m = Prophet()
m.fit(df)
<fbprophet.forecaster.Prophet at 0x7f1f1cb3a7f0>
future = m.make_future_dataframe( periods=365 )
future.tail(3)
ds
4663 2017-12-27
4664 2017-12-28
4665 2017-12-29
forecast = m.predict( future )
forecast[ [ 'ds', 'yhat', 'yhat_lower', 'yhat_upper' ] ].tail()
ds yhat yhat_lower yhat_upper
4661 2017-12-25 35736.021198 26901.256042 43506.681983
4662 2017-12-26 35781.066339 26897.317236 42529.643307
4663 2017-12-27 35749.136369 26619.638417 43537.664311
4664 2017-12-28 35714.091179 26785.438078 43183.694451
4665 2017-12-29 35650.791639 26849.725225 43201.114843
plt.figure( figsize=( 12, 8) )
plt.plot( KIA['Close'] )
plt.show()

png

m.plot(forecast)

png

png

m.plot_components( forecast )

png

png

3) 기아자동차 주식 - Growth Model

df = pd.read_csv( './data_science/07. example_wp_R.csv' )
df['y' ] = np.log( df['y'] )
df['cap'] = 8.5
m = Prophet( growth='logistic' )
m.fit(df)
<fbprophet.forecaster.Prophet at 0x7f1f271417f0>
future = m.make_future_dataframe( periods=1826 )
future['cap'] = 8.5
fcst = m.predict( future )
m.plot( fcst )

png