π₯ Breaking changes
- Make bottom interval closed in
hist
(#22090)
- Change Partition API to
base_path
and file_path
(#21888)
π Performance improvements
- Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
β¨ Enhancements
- Add
SPLIT_PART
string function to the SQL interface (#22158)
- Allow scalar expr in
Expr.diff
(#22142)
- Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAY
function to the SQL interface (#22129)
- Add dt.is_business_day (#21776)
- Add an
eager
parameter to pl.cov
(#22098)
- Add support for
Int128
parsing/recognition to the SQL interface (#22104)
- Add an
eager
parameter to pl.coalesce
(#22092)
- Add an
eager
parameter to pl.corr
(#22097)
- Allow sinking to abstract python
io
and fs
classes (#21987)
- Add
add_alp_optimize_exprs
to IRBuilder
(#22061)
- Add
cat.slice
(#21971)
- Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCell
with Mutex
(#21927)
- Support modified dsl in file cache (#21907)
π Bug fixes
- Implode in agg (#22197)
- Reduce GIL hold time for IO plugins in new-streaming (#22186)
- Enhance predicate validation and cast safety in
join_where
(#22112)
- Handle Parquet with compressed empty DataPage v2 (#22172)
- Schema error during lowering (#22175)
- Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
- Incorrect rounding for very large/small numbers (#22173)
- Allow set input to
list.set_*
operations (#22163)
- Deadlock in join due to rayon nested task-stealing (#22159)
- Mark
Expr.repeat_by
as elementwise (#22068)
- Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
- Raise an error if a number doesn't have associated unit in duration strings (#22035)
- Add
i128
as supertype to boolean (#22138)
- Fix panic when constructing DF from pyarrow due to duplicate field names (#22114)
- Add broadcasts and error messages for many elementwise operations (#22130)
- Throw error for
n=0
on list.gather_every
(#22122)
- Throw error for unsupported rolling operations (#22121)
- Error on unequal length
str.to_integer
arguments (#22100)
- Make bottom interval closed in
hist
(#22090)
- Relative path resolution for plugin libraries (#21911)
- Avoiding panic with striptime for out-of-bounds dates (#21208)
- Join revmaps for categoricals in
merge_sorted
(#21976)
- Fix glob expansion matching extra files (#21991)
- Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
- Parquet filter performance regression from multiscan dispatch (#22116)
- Panic for unequal length
ewm_mean_by
args (#22093)
- Add scalarity checks to
pl.repeat
(#22088)
- Type check
n
parameter of pl.repeat
(#22071)
- Mark
bitwise_{count,leading,trailing}_{ones,zeros}
as elementwise (#22044)
- Mark
pl.*_ranges
functions correctly as element-wise (#22059)
- Correctly type check
pl.arctan2
(#22060)
- Mark
pl.business_day_count
as elementwise (#22055)
- Check input python type for
str.extract_groups
(#22032)
- Check types for
fill_char
in str.pad_{start,end}
(#22036)
- Mark
str.to_decimal
properly as non-elementwise (#22040)
- Documented return type for
bin.encode
and bin.decode
(#22022)
- Revert #22017 and improve block(_in_place)_on doc comment (#22031)
- Remove outdated depth warning (#22030)
- Expression pl.concat was incorrectly marked as elementwise (#22019)
- Use block_in_place_on to start streaming (#22017)
- Panic on empty aggregation in streaming (#22016)
- Error instead of panick for invalid durations in
dt.offset_by()
and dt.round()
(#21982)
- Raise error instead of silently appending NULL in NDJSON parsing (#21953)
- Ensure AV is static before pushing to row buffer (#21967)
- Deadlock in new-streaming multiplexer (#21963)
- Release GIL in
collect_with_callback
(#21941)
- Panic in new RegexCache (#21935)
- Type hint of
cs.exclude()
is SelectorType
instead of Expr
(#21892)
- Add correct deprecation warning for .str.concat (#21666)
- Use absolute paths by defaults for plugins (#21904)
π Documentation
- Add user guide section on working with Sheets in Colab (#22161)
- Update distributed engine docs (#22128)
- Add Polars Cloud release notes (#22021)
- Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
- Fix typo (#21954)
- Fix 'pickleable' typo in docs (#21938)
- Change ctx to compute=ctx for all remote query examples (#21930)
π οΈ Other improvements
- Remove old
MultiScanExec
for in-memory (#22184)
- Separate
FunctionOptions
from DSL calls (#22133)
- Undeprecate
backward_fill
and forward_fill
(#22156)
- Handle conversion of Duration specially in pyir (#22101)
- Deprecate duplicate
backward_fill
and forward_fill
interface (#22083)
- Solve clippy lints for 1.86 (#22102)
- Remove rust exclusive
MaxBound
and MinBound
fill strategies (#22063)
- Change Partition API to
base_path
and file_path
(#21888)
- Fix pydantic model_fields deprecation (#21958)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @EnricoMi, @Jacob640, @JakubValtar, @MarcoGorelli, @MaxJackson, @alexander-beedie, @amotzop, @anath2, @bschoenmaeckers, @cnpryer, @coastalwhite, @dependabot[bot], @eitsupi, @etiennebacher, @hemanth94, @kdn36, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @rgertenbach, @ritchie46, @sebasv, @silannisik, @stijnherfst, @wence-, @zachlefevre and dependabot[bot]