精华帖子

DNN滚动训练5日选股

由bq93t66l创建,最终由bq93t66l 被浏览 9 用户

1. 策略概览

本策略基于DNN模型,在2018年至2025年期间对每年进行滚动训练。训练集时段为过去5年,测试集时段为未来一年,如2018年训练集采用2013-01-01至2017-12-31,测试集时段为2018-01-01至2018-12-31。数据使用当期全市场数据,聚焦于量价数据及其衍生,如5日均值比例,量价5日相关性以及量价横截面分位数排名等。标签设置为未来5日收益率的分位数排名。回测时选取预测分数top200只股票,每5日调仓。

2. 数据处理

数据采用cn_stock_bar1d表内数据。包括原始量价数据和构建的因子共53个模型输入特征。只对于训练集特征进行4倍标准差winsorize,训练集标签1和99分位数winsorize,测试集特征不做处理避免回测时引入未来信息泄露。数据准备阶段不做标准化,由模型内部BatchNorm1d实现标准化。具体实现代码如下:

def get_data(start_date, end_date, is_train=True): 

    sql1 = """
    WITH feature_table AS (
        /*基础特征*/
        SELECT date, instrument, close close_0, open open_0, high high_0, low low_0, amount amount_0, turn * 100 turn_0, change_ratio + 1 return_0, 

        /*均线*/
        m_AVG(close,5)/close  ma_close_5,
        m_AVG(turn * 100,5)/turn ma_turn_5, 
        m_AVG(amount,5)/amount ma_amount_5, 
        m_AVG(change_ratio + 1, 5)/(change_ratio + 1)  ma_cr_5,

        /*标准差*/
        m_STDDEV(close, 5) std_close_5,
        m_STDDEV(turn * 100,5) std_turn_5,
        m_STDDEV(amount,5) std_amount_5,
        m_STDDEV(change_ratio + 1,5)  std_cr_5,

        /*排名百分比*/
        m_rolling_rank(close, 5)/5 rank_close_5, 
        m_rolling_rank(low, 5)/5 rank_low_5, 
        m_rolling_rank(open, 5)/5 rank_open_5, 
        m_rolling_rank(high, 5)/5 rank_high_5, 
        m_rolling_rank(turn * 100, 5)/5  rank_turn_5, 
        m_rolling_rank(amount, 5)/5 rank_amount_5, 
        m_rolling_rank(change_ratio+1, 5)/5 rank_cr_5,

        /*相关系数*/
        m_CORR(volume, change_ratio+1, 5)  corr_vcr, 
        m_CORR(volume, close, 5)  corr_vc,
        m_CORR(volume, turn * 100, 5)  corr_vt,

        m_CORR(change_ratio+1, close, 5)  corr_crc,
        m_CORR(change_ratio+1, turn, 5)  corr_crt, 
        m_CORR(high, low, 5)  corr_hl,
        m_CORR(high, close, 5)  corr_hc,
        m_CORR(high, open, 5)  corr_ho,
        m_CORR(low, close, 5)   corr_lc,
        m_CORR(low, open, 5)   corr_lo,
        m_CORR(close, open, 5)  corr_co,
        m_CORR(close, turn * 100, 5)  corr_ct,


        /*截面特征*/
        c_pct_rank(turn) cross_turn,
        c_pct_rank(change_ratio + 1) cross_change_ratio,
        c_pct_rank(ma_close_5) cross_ma_close_5,
        c_pct_rank(ma_turn_5) cross_ma_turn_5,
        c_pct_rank(ma_amount_5) cross_ma_amount_5,
        c_pct_rank(ma_cr_5) cross_ma_cr_5,
        c_pct_rank(std_close_5) cross_std_close_5,
        c_pct_rank(std_turn_5) cross_std_turn_5,
        c_pct_rank(std_amount_5) cross_std_amount_5,
        c_pct_rank(std_cr_5) cross_max_cr_r,
        c_pct_rank(rank_close_5) cross_rank_close_5,
        c_pct_rank(rank_turn_5) cross_rank_turn_5,
        c_pct_rank(rank_amount_5) cross_rank_amount_5,
        c_pct_rank(rank_cr_5) cross_rank_cr_5,
        c_pct_rank(corr_vcr) cross_corr_vcr,
        c_pct_rank(corr_vc) cross_corr_vc,
        c_pct_rank(corr_vt) cross_corr_vt,
        c_pct_rank(corr_crc) cross_corr_crc,
        c_pct_rank(corr_crt) cross_corr_crt,

        FROM cn_stock_bar1d
        QUALIFY COLUMNS(*) IS NOT NULL
    )
    """

    if is_train:
        print('抽取训练集数据')
        sql2 = """
        /*标签*/
        ,
        label_table AS (
            SELECT date, instrument, 
            m_lead(close, 5) / m_lead(open, 1) - 1 AS _future_return, 
            all_quantile_cont(_future_return, 0.01) AS _future_return_1pct, 
            all_quantile_cont(_future_return, 0.99) AS _future_return_99pct, 
            clip(_future_return, _future_return_1pct, _future_return_99pct) AS _label, 
            c_pct_rank(_label) as label, 
            FROM cn_stock_bar1d
            QUALIFY COLUMNS(*) IS NOT NULL AND m_lead(high, 1) != m_lead(low, 1)
        )

        -- 移除特征标准化
        SELECT date, instrument, label, COLUMNS(feature_table.* EXCLUDE (date, instrument)) FROM feature_table
        INNER JOIN label_table USING (date, instrument)
        ORDER BY date, instrument;
        """
    
    else:
        print('抽取测试集数据')
        sql2 = """
        /*数据提取*/
        SELECT feature_table.* FROM feature_table
        ORDER BY date, instrument
        """
    sql = sql1+sql2 
    df = dai.query(sql, filters={'date': [start_date, end_date]}).df()

    df = pl.from_pandas(df)
    df = df.fill_nan(None)
    df = df.select(pl.all().forward_fill().over('instrument'))
    df = df.fill_null(0)
    if is_train:
        df = df.with_columns(pl.exclude('date','instrument').clip(
            pl.exclude('date','instrument').mean()-4*pl.exclude('date','instrument').std(),
            pl.exclude('date','instrument').mean()+4*pl.exclude('date','instrument').std()
        ))
    #     df = df.with_columns((pl.col('label')-pl.col('label').mean())/(pl.col('label').std()+1e-6))
    return df

def get_train_test(start_year:str='2023'):
    '''
    默认5年训练,一年测试
    start_year: 测试集开始年份,训练集自动后选5年
    默认从1月1到12月31
    '''
    train_start_date = str(int(start_year)-5)+'-01-01'
    train_end_date = str(int(start_year)-1)+'-12-31'

    test_start_date = start_year+'-01-01'
    test_end_date = start_year+'-12-31'
    train_df = get_data(train_start_date, train_end_date, is_train=True)
    test_df = get_data(test_start_date, test_end_date, is_train=False)
    return train_df, test_df

模型

模型采用DNN(多层感知机),4层全连接层。代码如下:

class DNN(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.pipe = nn.Sequential(
            nn.BatchNorm1d(input_dim),
            nn.Linear(input_dim, 256),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(256,128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128,64),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(64,1)
        )
    def forward(self, x):
        y = self.pipe(x)
        return y

训练

训练时在训练集内部按4:1再次划分训练集和验证集。损失函数为mse,优化器为Adam。参数设置如下:

batch size 512
learning rate 0.001
max_epochs 50

结果

回测采取5日调仓,每次选取预测分数最高的200只股票。基准设置为沪深300.

各年回测表现

年份 基准(沪深300)收益率 策略累计收益率 超额收益率
2018 -27.63% -3.01% +34.02%
2019 +34.42% +44.41% +7.43%
2020 +26.72% +11.26% -12.2%
2021 -10.1% +23.83% +37.73%
2022 -20.07% -2.1% +22.48%
2023 -14.5% +1.45% +18.65%
2024 +19.75% +49.46% +24.81%
2025 +22.9% +33.34% +8.49%

2018-2025年回测曲线

{link}