π Highlights
- improve join performance through radix partitioned join (#12270)
β οΈ Deprecations
- Rename
write_csv
parameter has_header
to include_header
(#12351)
- Deprecate
_saturating
in duration string language, make it the default (#12301)
- Switch args for
Decimal
and set default scale=0
(#12224)
- Rename
dt.seconds
to dt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
- Deprecate
DataFrame.as_dict
positional input (#12131)
π Performance improvements
- indexvec in group-by (#12371)
- Reduce allocations in hash join (#12368)
- Change concurrency parameters (#12321)
- Improve join performance through radix partitioned join (#12270)
- Remove extra multiplication in hash_to_partition (#12233)
- Allow non-power-of-two partitions (#12225)
- Reduce compute in error message for failed datetime parsing (#12147)
β¨ Enhancements
- Updated
BytecodeParser
for Python 3.12 (#12348)
- Add
round_sig_figs
expression for rounding to significant figures (#11959)
- Change concurrency parameters (#12321)
- Deprecate
_saturating
in duration string language, make it the default (#12301)
- Auto-infer
ambiguous
for truncate and round (#12204)
- Allow construction of
Datetime
series from datetime.date
array (#12175)
- New
Config
options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
- Allow non-aggregation predicate in ternary groupby (#12286)
- Add
name=
in .write_avro
to set schema name (#12255)
- Update
write_delta
to write large arrow types without casting (#12260)
- Add support for reading zstd compressed files (no-options) in read_csv (#12214)
- Start prefetching all files immediately (#12201)
- Expose more options to plugin registration (#12197)
- Add
.list.to_array
expression (#12192)
- Consolidate & improve all casting failure error messages (#12168)
- Add Binary dtype to hypothesis tests (#12140)
- Tunable concurrency (#12171)
- Support reverse sort in streaming (#12169)
- Add
.arr.to_list
expression (#12136)
- Support decimals in assert utils (#12119)
- Add concurrency budget (#12117)
- Improved support for use of file-like objects with
DataFrame
"write" methods (#12113)
- Introduce ignore_nulls for str.concat (#12108)
π Bug fixes
- Do not cast lit if has same dtype (#12342)
- Fix index column name of rolling/dynamic group by (#12365)
- Ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
UInt64
should be correctly extracted from python object (#12338)
- Ignore IDE-mediated DeprecationWarning when debugging tests under 3.12 (#12343)
- expr_output_name include literal (#12335)
- Fix Decimal dtype table repr (#12318)
- Fix behavior of month intervals in
date_range
(#12317)
- Scan empty csv miss row_count (#12316)
- zip_with also broadcast mask (#12309)
- respect hive_partitioning flag when dealing with multiple files (#12315)
- parquet, add row_count to empty file materialization (#12310)
- Fix invalid DeprecationWarning generated from
date_range
defined with 'saturating' interval (#12311)
- fix download ranges in parquet (#12313)
- object store path derivation for local URL (#12308)
- don't move right endpoint of windows in rolling in default
offset==-period
case (#12267)
- Raise more informative error on invalid
reshape
input (#12288)
- incorrect super type for literals in nested binary exprs (#12238)
- typo in exception message (#12278)
- fix ambiguous aggregation type (#12269)
- return frames from
read_excel
in the originally specified order (#12243)
- Consistently propagate nulls for
numpy
ufuncs (#12212)
- respect return_scalar of list scalars (#12251)
- fix plugins system on Windows (#12230)
- potential overflow (#12206)
- always start a new thread if the thread is already blocking (#12202)
- with_row_count should block predicate push down for lazy csv (#12187)
- rechunk failed-list series before iterate (#12189)
- Fix interchange protocol boolean buffer size (#12177)
- fix incorrect desc sort behavior (#12141)
take
should block predicate pushdown (#12130)
- use null type when read from unknown row (#12128)
- boundary predicate to block all accumulated predicates in push down (#12105)
- make python
schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045)
- fix panic when initializing Series with array of list dtype (#12148)
- Fix schema of arr.min/max (#12127)
- ensure filter predicate inputs exist in schema (#12089)
- Update
null_count
after arithmetic (#12280)
π οΈ Other improvements
- Workaround for maturin issue (#12370)
- Fix incorrect boundary column name in
group_by_dynamic
docstrings (#12366)
- Fix typo in
rolling_*
docstrings (#12362)
- Fix ruff linting invocation (#12350)
- Clean up conversion utils (#11789)
- Organize Cargo.toml (#12323)
- Consolidate "getting started" and "user guide" sections (#12246)
- Minor updates to prepare for Python 3.12 support (#12314)
- Move script for testing map warning (#12306)
- simplify expr checking in predicate push down (#12287)
- Remove external link (#12223)
- Fix rebase issue breaking CI (#12296)
- Add top-level
make clippy
, simplify Rust linting workflows (#12290)
- ensure we git-ignore ALL
.venv
dirs (#12289)
- incorrect super type for literals in nested binary exprs (#12238)
- Remove recommended setting from IDE docs (#12275)
- Clean up Python test workflow (#12261)
- clarify contains selector (#12265)
- Add
py-polars
to Cargo workspace (#12256)
- Use
.with_columns
in some docstrings (#12250)
- Add test for
scan_csv
plus slice
(#12239)
- Fix emphasis formatting in docstring (#12240)
- Fix emphasis formatting in docstring (#12237)
- add deprecation notices to the docs for expressions moved into the new
name
namespace (#12236)
- update Cargo.lock (#12226)
- make sort test work with unstable sort (#12221)
- Build Python wheels on
manylinux_2_28
(#12211)
- Include
rust-toolchain.toml
with sdist/wheels (#12184)
- Standardize project name formatting across docs (#12185)
- Update
sqlparser
to 0.39
(#12173)
- pin ring (#12176)
- Improve
strip_{prefix, suffix}
& strip_chars_{start, end}
(#12161)
- Fix tests for pyarrow 14 (#12170)
- Fix rendering of note in
DataFrame.fold
(#12164)
- Fix triggers for docs deployment (#12159)
- Refactor some tests (#12121)
- Consolidate contributing info (#12109)
- Fix typo in user-guide/expressions/plugins.md (#12115)
- Render docstring text in single backticks as code (#12096)
- use more ergonomic syntax in select/with_columns where possible (#12101)
- Update CODEOWNERS (#12107)
- visualize plugin directory layout in user guide (#12092)
- Minor tweak in code example in section Expressions/Aggregation (#12033)
- Minor tweak in code example in section Expressions/Missing data (#12080)
- Minor improvements to the docs website (#12084)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @alexander-beedie, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jrycw, @mcrumiller, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego and @wsyxbcl