π Highlights
- Add support for
IO[bytes]
and bytes
in scan_{...}
functions (#18532)
- Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
π Performance improvements
- Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
- Don't traverse file list twice for extension validation (#18620)
- Remove cloning of
ColumnChunkMetadata
(#18615)
- Add upfront partitioning in
ColumnChunkMetadata
(#18584)
- Enable Parquet
parallel=prefiltered
for auto
(#18514)
- Change
PlSmallStr
impl from Arc<str>
to compact_str
(#18508)
- Added optimizer rules for
is_null().all()
and similar expressions to use null_count()
(#18359)
β¨ Enhancements
- Update
BytecodeParser
for upcoming Python 3.13 (#18677)
- Add tooltip by default to charts (#18625)
- Add support for
IO[bytes]
and bytes
in scan_{...}
functions (#18532)
- Support shortcut eval of common boolean filters in SQL interface "WHERE" clause (#18571)
- Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
- Make expressions containing Python UDFs serializable (#18135)
π Bug fixes
- Use IO[bytes] instead of BytesIO in
DataFrame.write_parquet()
(#18652)
- Scalar checks (#18627)
- Scanning hive partitioned files where hive columns are partially included in the file (#18626)
- Enable "polars-json/timezones" feature from "polars-io" (#18635)
- Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
- Properly slice validity mask on pl.Object series (#18631)
- Raise if single argument form in
replace
/replace_strict
is not a mapping (#18492)
- Fix group first value after group-by slice (#18603)
- Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
- Fix output type for
list.eval
in certain cases (#18570)
- Fix
map_elements
for List return dtypes (#18567)
- Check for duplicate column names in
read_database
cursor result, raising DuplicateError
if found (#18548)
- Do not remove double-sort if
maintain_order=True
(#18561)
- Empty any_horizontal should be false, not true (#18545)
- Fix type inference error in
map_elements
for List types (#18542)
- Address incorrect
align_frames
result when the alignment column contains NULL values (#18521)
- Fix advertised version in source builds (#18523)
- Handle Parquet projection pushdown with only row index (#18520)
- DataFrame
write_database
not passing down "engine_options" when using ADBC (#18451)
- Properly raise on invalid selector expressions (#18511)
- Wrong output column name in
or
and xor
operations (#18512)
- Normalize by default in Series.entropy like Expr.entropy does (#18493)
- Various schema corrections (#18474)
- Don't drop objects on empty buffers (#18469)
- Expr.sign should preserve dtype (#18446)
- Ensure
assert_frame_not_equal
and assert_series_not_equal
raise on mismatched input types (#18402)
- Fixed
Worksheet
definition in write_excel
type annotations (#18452)
π Documentation
- Update join_where docs to clarify behaviour (#18670)
- Fix multiprocessing docs regarding fork method check (#18563)
- Various docstring improvements to
testing.assert_*
functions (#18494)
- Fix formula in ewm_mean_by (#18506)
- Pre-compute plugin_path before defining plugin (#18503)
- Add Expr.null_count to aggregations (#18459)
π οΈ Other improvements
- Fix a bunch of tests for new-streaming (#18659)
- Don't raise on multiple same names in ie_join (#18658)
- Check predicates in join_where (#18648)
- Change join_where semantics (#18640)
- Add benchmark tests for join_where with inequalities (#18614)
- Check number of binary comparisons in join_where predicates (#18608)
- Raise on suffixed predicate in join_where (#18607)
- Fix Python docs build (#18605)
- Use
streaming
argument in test_parquet_slice_pushdown_non_zero_offset
(#18529)
- Fix delta test merge (#18601)
- Alter/skip some tests for new streaming (#18574)
- Add lower-bound pin for numba (#18555)
- Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
- Change
PlSmallStr
impl from Arc<str>
to compact_str
(#18508)
- Make expressions containing Python UDFs serializable (#18135)
- Change naming to new benchmark setup (#18473)
- Ensure physical arguments to np ufuncs are rechunked (#18471)
- Remove a string allocation in Parquet (#18466)
- Remove network call in hf docs (#18454)
Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @WbaN314, @adamreeve, @alexander-beedie, @alonme, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @ion-elgreco, @krasnobaev, @megaserg, @nameexhaustion, @ohanf, @orlp, @philss, @r-brink, @ritchie46, @skellys, @squnit, @stinodego, @wence- and @yarimiz