Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/23: Cardinality and frequency estimation ??? Vasiliki Kalavri | Boston University 2020 Counting distinct elements 2 ??? Vasiliki probability • Counter overestimation is almost certain for very large data streams with high-frequency elements Counting Bloom Filter ??? Vasiliki Kalavri | Boston University 2020 20 • A space-efficient 6 2 3 2 2 9 7 3 0 5 8 5 0 9 0 … ??? Vasiliki Kalavri | Boston University 2020 23 Estimating frequency 0 0 0 6 9 3 3 1 5 0 0 3 8 2 7 9 m counters h1 h2 hp 3 0 0 3 0 5 8 2 0 0 2 9 2 4 5 2 7 6 20 码力 | 69 页 | 630.01 KB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.12columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and from the ultrafast HDF5 format • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging warn with a AttributeConflictWarning if you are attempting to append an index with a different frequency than the existing, or attempting to append an index with a different name than the existing – support0 码力 | 657 页 | 3.58 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.14.0columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and from the ultrafast HDF5 format • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging array of whether the timestamp(s) are at the start/end of the month/quarter/year defined by the frequency of the DateTimeIndex / Timestamp (GH4565, GH6998) • Local variable usage has changed in pandas0 码力 | 1349 页 | 7.67 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.15DateArray properties and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 3.3 Frequency conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 20.4 Frequency Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and0 码力 | 1579 页 | 9.15 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.15.1DateArray properties and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 3.3 Frequency conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 20.4 Frequency Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and0 码力 | 1557 页 | 9.10 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.13.1columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and from the ultrafast HDF5 format • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging divided by another timedelta64[ns] object, or astyped to yield a float64 dtyped Series. This is frequency conversion. See the docs for the docs. In [69]: from datetime import timedelta In [70]: td =0 码力 | 1219 页 | 4.81 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.7.1columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and from the ultrafast HDF5 format • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging text (GH717) • Added abs method to pandas objects • Added crosstab function for easily computing frequency tables • Added isin method to index objects • Added level argument to xs method of DataFrame.0 码力 | 281 页 | 1.45 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.7.2columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and from the ultrafast HDF5 format • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging text (GH717) • Added abs method to pandas objects • Added crosstab function for easily computing frequency tables • Added isin method to index objects • Added level argument to xs method of DataFrame.0 码力 | 283 页 | 1.45 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.24.0allowing indexing with Timedelta object (GH20464) • Bug in DatetimeIndex where frequency was being set if original frequency was None (GH22150) • Bug in rounding methods of DatetimeIndex (round(), ceil() (GH23601) • Bug in date_range() when decrementing a start date to a past end date by a negative frequency (GH23270) • Bug in Series.min() which would return NaN instead of NaT when called on a series of DataFrame.combine() with datetimelike values raising a TypeError (GH23079) • Bug in date_range() with frequency of Day or higher where dates sufficiently far in the future could wrap around to the past instead0 码力 | 2973 页 | 9.90 MB | 1 年前3
pandas: powerful Python data analysis toolkit - 0.7.3columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed-frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and from the ultrafast HDF5 format • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging text (GH717) • Added abs method to pandas objects • Added crosstab function for easily computing frequency tables • Added isin method to index objects • Added level argument to xs method of DataFrame.0 码力 | 297 页 | 1.92 MB | 1 年前3
共 164 条
- 1
- 2
- 3
- 4
- 5
- 6
- 17













