Polars: py-0.19.0 Release

Release date:
August 30, 2023
Previous version:
py-0.18.15 (released August 15, 2023)
Magnitude:
23,909 Diff Delta
Contributors:
27 total committers
Data confidence:
Commits:

133 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Top Contributors in py-0.19.0

stinodego
ritchie46
orlp
alexander-beedie
MarcoGorelli
reswqa
aminalaee
sdamashek
svaningelgem
mcrumiller

Directory Browser for py-0.19.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

An upgrade guide is available on our website.

πŸ† Highlights

  • implementing sink_csv for LazyFrame (#10682)
  • Support DataFrame init from queries against users' existing database connections (#10649)
  • Rename groupby to group_by (#10656)

πŸ’₯ Breaking changes

  • return f64 for rank when method="average" (#10734)
  • Update a lot of error types (#10637)
  • Remove deprecated behavior from vertical aggregations (#10602)
  • Read/write support for IPC streams in DataFrames (#10606)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Improve consistency of parsing expression input (#9512)
  • allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)
  • Improve some error types and messages (#10470)

⚠️ Deprecations

  • Rename map to map_batches (#10801)
  • Rename GroupBy.apply to map_groups (#10799)
  • Rename DataFrame.apply to map_rows (#10797)
  • Rename Series/Expr.rolling_apply to rolling_map (#10750)
  • Rename Series/Expr.apply to map_elements (#10678)
  • Rename groupby to group_by (#10656)
  • Deprecate some parameters of cut/qcut (#10484)

πŸš€ Performance improvements

  • parse time zones outside of downcast_iter() in replace_time_zone (#10713)
  • use binary abstraction for atan2 (#10588)
  • use binary abstraction in pow (#10562)

✨ Enhancements

  • activate cse for group_by (again) (#10749)
  • implementing sink_csv for LazyFrame (#10682)
  • Supports series unique & arg_unique & n_unique for list (#10743)
  • repeat_by should also support broadcasting of LHS (#10735)
  • deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
  • is_first also supports numeric list type. (#10727)
  • improve slice pushdown in unions (#10723)
  • Explicitly implement Protocol for interchange classes (#10688)
  • Support min and max strategy for binary & str columns fill null (#10673)
  • support broadcasting in list set operations (#10668)
  • csv: add schema argument (#10665)
  • Support DataFrame init from queries against users' existing database connections (#10649)
  • add truncate_ragged_lines (#10660)
  • supports cast to list (#10623)
  • Update a lot of error types (#10637)
  • preserve whitespace in notebook output (#10644)
  • Remove deprecated behavior from vertical aggregations (#10602)
  • support selector usage in write_excel arguments (#10589)
  • Add LazyFrame.collect_async and pl.collect_all_async (#10616)
  • Read/write support for IPC streams in DataFrames (#10606)
  • propagate null is in is_in and more generic array construction (#10614)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • frame-level cast support (#10504)
  • Improve consistency of parsing expression input (#9512)
  • Add failed column to cast exception (#10507)
  • allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
  • Remove deprecated get_idx_type - use get_index_type instead (#10556)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)
  • Improve some error types and messages (#10470)
  • suggest str.to_datetime instead of apply and stdlib strptime (#10266)

🐞 Bug fixes

  • get_single_leaf can't handle Expr::Count (#10790)
  • support groupby literal in streaming (#10771)
  • ORDER BY on unselected columns (#10752)
  • Fix is_in cannot cast list type for float (#10769)
  • whitespace CSS in Notebook HTML updated to use pre-wrap instead of pre (#10739)
  • only preserve sortedness flag in replace_time_zone when safe (#10738)
  • Error on value_counts on column named "counts" (#10737)
  • return f64 for rank when method="average" (#10734)
  • Keep min/max and arg_min/arg_max consistent. (#10716)
  • use time zone from dtype to overwrite output time zone when initialising Series (#10689)
  • Cast small int type when scan csv in streaming mode. (#10679)
  • raise exception with invalid on arg type for join_asof (#10690)
  • Reused input series in rolling_apply should not be orderly (#10694)
  • re-sort buffer when update window swap the whole buffer (#10696)
  • Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
  • Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
  • Correctly handle time zones in write_delta (#10633)
  • fix apply for empty series in threading mode (#10651)
  • respect 'ignore_errors=False' in csv parser (#10641)
  • fix rename + projection pushdown (#10624)
  • fix int/float downcast in is_in (#10620)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Fix serialization for categorical chunked. (#10609)
  • Take input_schema to create physical expr for Selection (#10571)
  • Clear window cache after evaluate predication expr (#10505)
  • Parsing regex col in Expr::Columns (#10551)
  • sanitize column naming in boolean ops (#10531)
  • Fix write_delta with schema in delta_write_options (#10541)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • respect pl.Config options relating to shape, column names, and types when rendering HTML (#10449)

πŸ› οΈ Other improvements

  • update cargo.lock (#10800)
  • Create .venv in repo root (#10789)
  • refactored write_database unit tests to properly separate concerns (#10773)
  • Fix some broken links / formatting (#10772)
  • Document chained when-then behaviour more prominently (#10759)
  • Fix test failing due to new adbc release (#10763)
  • Unpin connectorx and bump other Python dependencies (#10753)
  • add note to testing docs about module import (#10741)
  • Clear GitHub Actions caches weekly (#10715)
  • Update for new pyarrow 13.0.0 behavior (#10691)
  • Fix minor issue with sink_parquet docs (#10669)
  • Remove deprecate_renamed_methods util (#10537)
  • add "see also" entries to ne/eq_missing and update related examples (#10667)
  • fix potential memory leak from usage of inspect.currentframe (#10630)
  • give more relevant example for polars.apply (#10631)
  • Bump ruff and enable new setting (#10626)
  • Add docstrings for Expr.meta namespace (#10617)
  • Enforce up-to-date Cargo.lock (#10555)
  • deprecate DataFrame.replace (#10600)
  • ensure that make requirements fully refreshes unpinned packages/deps (#10591)
  • fix out-of-date explain default parameter (#10566)
  • Fix expr_dispatch decorator to work on methods with decorators (#10549)
  • Fix link to source code (#10542)
  • Add title to index page (#10539)
  • Disable SIM108 lint (#10519)
  • Keep versioned docs (#10500)
  • switch to pyo3/maturin-action (#10503)
  • Update URLs for dev documentation (#10495)
  • Skip failing test (#10496)
  • Add version switcher to API reference (#10488)

Thank you to all our contributors for making this release possible! @JulianCologne, @MarcoGorelli, @Object905, @OndrejSlamecka, @SeanTroyUWO, @VasanthakumarV, @alexander-beedie, @aminalaee, @braaannigan, @c-peters, @ion-elgreco, @lorepozo, @marki259, @mcrumiller, @messense, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @sdamashek, @stinodego, @svaningelgem, @titoeb, @trueb2, @washcycle and @zundertj