pandas: powerful Python data analysis toolkit - 0.14.0DataFrame(dict(household_id = [1,2,3], ....: male = [0,1,0], ....: wealth = [196087.3,316478.7,294750]), ....: columns = [’household_id’,’male’,’wealth’] ....: ).set_index(’household_id’) ....: In [71]: household Out[71]: male wealth household_id 1 0 196087.3 2 1 316478.7 3 0 294750.0 In [72]: portfolio = DataFrame(dict(household_id = [1,2,2,3,3,3,4], ....: asset_id = ["nl0000301109","nl0000289783","gb00b03mlx29" = [’household_id’,’asset_id’,’name’,’share’] 1.1. v0.14.0 (May 31 , 2014) 21 pandas: powerful Python data analysis toolkit, Release 0.14.0 ....: ).set_index([’household_id’,’asset_id’]) ....: In [73]:0 码力 | 1349 页 | 7.67 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.13.1dict(zip(range(3), np.random.randn(3))) .....: }) .....: In [106]: df["id"] = df.index In [107]: df Out[107]: A1970 A1980 B1970 B1980 X id 0 a d 2.5 3.2 -1.085631 0 1 b e 1.2 1.3 0.997345 1 2 c f 0.7 0 0.1 0.282978 2 [3 rows x 6 columns] In [108]: wide_to_long(df, ["A", "B"], i="id", j="year") Out[108]: X A B id year 0 1970 -1.085631 a 2.5 1 1970 0.997345 b 1.2 2 1970 0.282978 c 0.7 0 1980 -1.085631 Google BigQuery Project ID # To find this, see your dashboard: # https://code.google.com/apis/console/b/0/?noredirect projectid = xxxxxxxxx; df = gbq.read_gbq(query, project_id = projectid) # Use pandas0 码力 | 1219 页 | 4.81 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.15For full docs, see the categorical introduction and the API documentation. In [1]: df = DataFrame({"id":[1,2,3,4,5,6], "raw_grade":[’a’, ’b’, ’b’, ’a’, ’a’, ’e’]}) In [2]: df["grade"] = df["raw_grade"] Categories (5, object): [very bad < bad < medium < good < very good] In [7]: df.sort("grade") Out[7]: id raw_grade grade 5 6 e very bad 1 2 b good 2 3 b good 0 1 a very good 3 4 a very good 4 5 a very DataFrame(dict(household_id = [1,2,3], ....: male = [0,1,0], ....: wealth = [196087.3,316478.7,294750]), ....: columns = [’household_id’,’male’,’wealth’] ....: ).set_index(’household_id’) ....: In [71]:0 码力 | 1579 页 | 9.15 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.15.1For full docs, see the categorical introduction and the API documentation. In [1]: df = DataFrame({"id":[1,2,3,4,5,6], "raw_grade":[’a’, ’b’, ’b’, ’a’, ’a’, ’e’]}) In [2]: df["grade"] = df["raw_grade"] Categories (5, object): [very bad < bad < medium < good < very good] In [7]: df.sort("grade") Out[7]: id raw_grade grade 5 6 e very bad 1 2 b good 2 3 b good 0 1 a very good 3 4 a very good 4 5 a very DataFrame(dict(household_id = [1,2,3], ....: male = [0,1,0], ....: wealth = [196087.3,316478.7,294750]), ....: columns = [’household_id’,’male’,’wealth’] ....: ).set_index(’household_id’) ....: In [71]:0 码力 | 1557 页 | 9.10 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.17.0[1]: import statsmodels.formula.api as sm In [2]: bb = pd.read_csv('data/baseball.csv', index_col='id') # sm.poisson takes (formula, data) In [3]: (bb.query('h > 0') ...: .assign(ln_h = lambda df: np For full docs, see the categorical introduction and the API documentation. In [1]: df = DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']}) In [2]: df["grade"] = df["raw_grade"] Categories (5, object): [very bad, bad, medium, good, very good] In [7]: df.sort("grade") Out[7]: id raw_grade grade 5 6 e very bad 1 2 b good 2 3 b good 0 1 a very good 3 4 a very good 4 5 a very0 码力 | 1787 页 | 10.76 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.19.1Python data analysis toolkit, Release 0.19.1 In [20]: bb = pd.read_csv('data/baseball.csv', index_col='id') In [21]: (bb.groupby(['year', 'team']) ....: .sum() ....: .loc[lambda df: df.r > 100] ....: ) [1]: import statsmodels.formula.api as sm In [2]: bb = pd.read_csv('data/baseball.csv', index_col='id') # sm.poisson takes (formula, data) In [3]: (bb.query('h > 0') ...: .assign(ln_h = lambda df: np For full docs, see the categorical introduction and the API documentation. In [1]: df = DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e ˓→']}) In [2]: df["grade"] = df["raw_grade"]0 码力 | 1943 页 | 12.06 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.19.0operations without using temporary variable. In [20]: bb = pd.read_csv('data/baseball.csv', index_col='id') In [21]: (bb.groupby(['year', 'team']) ....: .sum() ....: .loc[lambda df: df.r > 100] ....: ) [1]: import statsmodels.formula.api as sm In [2]: bb = pd.read_csv('data/baseball.csv', index_col='id') # sm.poisson takes (formula, data) In [3]: (bb.query('h > 0') ...: .assign(ln_h = lambda df: np For full docs, see the categorical introduction and the API documentation. In [1]: df = DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e ˓→']}) In [2]: df["grade"] = df["raw_grade"]0 码力 | 1937 页 | 12.03 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.12’pandas.core.frame.DataFrame’> Int64Index: 100 entries, 88641 to 89534 Data columns (total 22 columns): id 100 non-null values year 100 non-null values stint 100 non-null values team 100 non-null values though it won’t always fit the console width: In [95]: print baseball.iloc[-20:, :12].to_string() id year stint team lg g ab r h X2b X3b hr 89474 finlest01 2007 1 COL NL 43 94 9 17 3 0 1 89480 embreal01 melt(cheese, id_vars=[’first’, ’last’]) first last variable value 0 John Doe height 5.5 1 Mary Bo height 6.0 2 John Doe weight 130.0 3 Mary Bo weight 150.0 In [30]: melt(cheese, id_vars=[’first’0 码力 | 657 页 | 3.58 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.21.1In [12]: df.rename(str.lower, axis='columns') Out[12]: a b 0 1 4 1 2 5 2 3 6 In [13]: df.rename(id, axis='index') \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[13]: 10 Chapter 1. What’s New pandas: 0 5.0 3 NaN NaN The “index, columns” style continues to work as before. In [16]: df.rename(index=id, columns=str.lower) Out[16]: a b 4503833264 1 4 4503833296 2 5 4503833328 3 6 In [17]: df.reindex(index=[0 Python data analysis toolkit, Release 0.21.1 In [20]: bb = pd.read_csv('data/baseball.csv', index_col='id') In [21]: (bb.groupby(['year', 'team']) ....: .sum() ....: .loc[lambda df: df.r > 100] ....: )0 码力 | 2207 页 | 8.59 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.24.0Period('2000-01-02', 'D'), Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object) In [19]: id(idx.values) \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[19]: ˓→139878025578576 In [20]: id(idx.values) \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ full docs, see the categorical introduction and the API documentation. In [127]: df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6], .....: "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']}) .....: Convert the raw0 码力 | 2973 页 | 9.90 MB | 1 年前3
共 32 条
- 1
- 2
- 3
- 4













