[原创]stata分组命令by,bys,bysort

这里只有作者精心编写的研究经历!
回复
头像
hellohappy
网站管理员
网站管理员
帖子: 267
注册时间: 2018年11月18日, 14:27
Been thanked: 1 time

#1 [原创]stata分组命令by,bys,bysort

未读文章 hellohappy » 2019年6月05日, 10:56

前言:
    我相信还是有人连stata的分组命令都还不熟悉的,有时候看到人家命令写的是by,有时候写的是bys,有的时候是bysort?先给结论,他们是同一个命令,bys是bysort的缩写,bysort的意思是by加上sort选项,与 by ... ,sort 等价。

命令介绍:

    help文件:
        by命令介绍,咱们先看help文件,人家写的很详细。
by的help文件
Show
Title

    [ D ] by -- Repeat Stata command on subsets of the data
    [ U ] 11 Language syntax

Syntax

        by varlist: stata_cmd

        bysort varlist: stata_cmd

    The above diagrams show by and bysort as they are typically used.  The full syntax of the commands is

        by varlist1 [(varlist2)] [, sort rc0]:  stata_cmd

        bysort varlist1 [(varlist2)] [, rc0]:  stata_cmd

Description

    Most Stata commands allow the by prefix, which repeats the command for each group of observations for which the values of the variables in varlist are the same.  by without the sort option requires that the data be sorted by varlist; see [D] sort.

    Stata commands that work with the by prefix indicate this immediately following their syntax diagram by reporting, for example, "by is allowed; see [D] by" or "bootstrap, by, etc., are allowed; see prefix".

    by and bysort are really the same command; bysort is just by with the sort option.

    The varlist1 (varlist2) syntax is of special use to programmers.  It verifies that the data are sorted by varlist1 varlist2 and then performs a by as if only varlist1 were specified.  For instance,

        by pid (time): generate growth = (bp - bp[_n-1])/bp

    performs the generate by values of pid but first verifies that the data are sorted by pid and time within pid.

Options

    sort specifies that if the data are not already sorted by varlist, by should sort them.

    rc0 specifies that even if the stata_cmd produces an error in one of the by-groups, then by is still to run the stata_cmd on the remaining by-groups.  The default action is to stop when an error occurs.  rc0 is especially useful when stata_cmd is an estimation command and some by-groups have insufficient observations.

Examples

    ------------------------------------------------------------------------------------------
    Setup
        . sysuse auto

    For each category of foreign, display summary statistics for rep78
        . by foreign:  summarize rep78

    Same as above command, but check that the data are sorted by foreign and make within foreign
        . by foreign (make):  summarize rep78
        not sorted
        r(5);
        . sort foreign make
        . by foreign (make): summarize rep78

    For each category of rep78, display frequency counts of foreign
        . by rep78: tabulate foreign
        not sorted
        r(5);
        . sort rep78
        . by rep78: tabulate foreign

    Equivalent to above two commands
        . by rep78, sort:  tabulate foreign

    Equivalent to above command
        . bysort rep78: tabulate foreign

    For each category of rep78 within categories of foreign, display summary statistics for price
        . by foreign rep78, sort: summarize price

    --------------------------------------------------------------------------------------------------------
    Setup
        . sysuse autornd
        . keep in 1/20

    Store in new variable mean_w the mean value of weight for each category of mpg
        . by mpg, sort: egen mean_w = mean(weight)
    ---------------------------------------------------------------------------------------------------------

Technical note

    by repeats the stata_cmd for each group defined by varlist.  If stata_cmd stores results, only the results from the last group on which stata_cmd executes will be stored.

    简单使用举例:

        1.根据foreign变量分组生成price变量的基本统计信息:

Code: 全选

sysuse auto //用于导入系统默认数据
by foreign: summarize price


        2.根据foreign变量分组来生成rep78变量的基本统计信息,但实现前需要先按照变量组合foreign、make来排序:

Code: 全选

sysuse auto //用于导入系统默认数据
by foreign (make):  summarize rep78
//直接执行上面命令会报错,因为原先并没有对变量执行排序。
//所以需要先排序,再执行
sort foreign make
by foreign (make):  summarize rep78


        3.根据变量组合foreign和rep78排序,然后再分组,生成price的基本信息:
            有时候我们可以用bysort命令或者加上sort选项省掉sort数据这一步,比如下面三种表述都是可以且等价的:

Code: 全选

by foreign rep78, sort: summarize price
bysort foreign rep78: summarize price
bys foreign rep78: summarize price


    其他:

        部分命令除了可以直接用by作为前缀,也可以使用by作为选项,比如 egen 命令,可在后面使用by选项,下面两句是等价的:

Code: 全选

by foreign:egen total_price=total(price)
egen total_price=total(price) ,by(foreign)

Link:
Hide post links
Show post links


回复