This version includes quite a few breaking changes. We are preparing for the 1.0
release and aim to make the upgrade from 0.20
to 1.0
as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0
.
Check out the upgrade guide for help navigating the upgrade to this version.
Please bear with us while we continue to make Polars the best tool it can be!
π Highlights
- Add new
Enum
categorical data type which allows a fixed set of categories (#11822)
π₯ Breaking changes
- Use Object Store instead of fsspec for
read_parquet
(#13044)
- Reimplement
replace
expression on the Rust side (#13002)
- Preserve left and right join keys in outer joins (#12963)
- Update
update
signature (#12986)
- Update
Expr.count
to ignore null values by default (#12934)
- Scheduled removal of previously deprecated functionality (#12885)
- Allow all
DataType
objects to be instantiated (#12470)
- Change
value_counts
resulting column name from counts
to count
(#12506)
- Change default
join
behavior with regard to nulls, add join_nulls
parameter to keep existing behavior (#12840)
- Default to exact checking for integers in assertion utils (#12331)
- Set default dtype for Series to
Null
when no data is present (#12807)
- Update
lit
behavior for list/tuple inputs (#12559)
- Change
DataType.is_nested
from property to classmethod (#12453)
- Update constructors for Array and Decimal (#12837)
- Smaller integer data types for datetime components (#12070)
- Fix
NaN
ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)
β οΈ Deprecations
- Rename
write_database
parameter if_exists
to if_table_exists
(#12783)
π Performance improvements
- Avoid dispatching to expression engine for various
Series
methods (#13010)
- Elide allocation in outer join materialization (#12992)
- Avoid dispatching
Series.head/tail
to the expression engine (#12946)
- Ensure we reduce for
any/all_horizontal
(#12976)
- Add fast paths for UTC in
truncate
(#12965)
- Use
select_seq
for expression dispatch (#12962)
- Improve
rolling_median
algorithm (#12704)
- Use fast path for non-null data in new SQL-like null matching (#12874)
- Optimize
DataFrame.iter_rows
for smaller buffer sizes (#12804)
- Speed up initializing
Series
from a list of NumPy arrays (#12785)
β¨ Enhancements
- Add
str.contains_any
and str.replace_many
(Aho-Corasick algorithms) (#13073)
- Auto-infer credentials from
.aws
folder (#13062)
- Support private cloud S3 storage in
scan_parquet
(#13060)
- Use Object Store instead of fsspec for
read_parquet
(#13044)
- Avoid dispatching to expression engine for various
Series
methods (#13010)
- Allow order operators (<,>,>=,<=) on Enum types (#12982)
- Reimplement
replace
expression on the Rust side (#13002)
- Expand set of NumPy functions which emit
inefficient map_*
warning (#13039)
- Use tokio semaphore for concurrency handling (#13026)
- Improve and expressify
hist
(#13014)
- Update
describe
to use new count
implementation (#12990)
- Add default
to_struct
Series name consistent with the usual default Series name (empty string) (#12998)
- Preserve left and right join keys in outer joins (#12963)
- Clarify "inefficient
map_elements
" warning message (#12978)
- Allow
end
before start
in date/time_range
(#12964)
- Update
update
signature (#12986)
- Minor update to
Array
data type repr (#12973)
- Implement group-tuples for
Null
dtype (#12975)
- Cast to an enum from int (#12954)
- Move categorical ordering into dtype (#12911)
- Avoid importing interchange module by default (#12927)
- Update
Expr.count
to ignore null values by default (#12934)
- Raise if expression passed as scalar to DataFrame constructor (#12916)
- Update
repr
of Struct
data type class (#12922)
- Enable partial predicate pushdown past window expressions (#12710)
- Add
merge
mode to write_delta
and remove pyarrow to delta conversions (#12392)
- Add
str.reverse
(#12878)
- Allow all
DataType
objects to be instantiated (#12470)
- Specific performance warnings from Rust to Python (#12802)
- Change
value_counts
resulting column name from counts
to count
(#12506)
- Implement
std
and var
for Duration
columns (#12865)
- Change default
join
behavior with regard to nulls, add join_nulls
parameter to keep existing behavior (#12840)
- Enhance
write_database
return (indicate the number of rows affected by the operation) (#12830)
- Add dedicated
Decimal
selector (#12852)
- Preserve base dtype when raising to
UInt
power (#10446)
- Default to exact checking for integers in assertion utils (#12331)
- Improve
__repr__
implementation for Expr
(#12770)
- Support SQL subqueries for
JOIN
and FROM
(#12819)
π Bug fixes
- Fix off-by-one error in
quantile(method="nearest")
(#13058)
- Fix incorrect schema inference on nested columns (#13057)
- Don't raise for
datetime_range
if starting on ambiguous datetime and earliest was specified (#13050)
- Parse
json_decode
per max buffer length (#13029)
- Parse
00:00
time zone as UTC (#13034)
- Fix timeout errors in concurrent downloads (#13023)
- Streamline
align_frames
and fix edge-case where the identical frame object appears more than once (#13007)
- Fix SQL substring indexing (#13016)
- Allow broadcasting in
ranges
(#11900)
- Prevent deadlock in
sink_csv
(#12991)
- Don't get mutable if buffer is sliced (#12979)
- Support parameterized
read_database
calls against cursors that only take positional args (#12967)
- Fix
truncate
when truncating by multiple weeks (#12948)
- Fix segfault / memory corruption after plugins return
Err
result (#12953)
- Raise a proper python typed exception when IO writers try to write to an non existent folder (#12936)
- Don't panic when
ambiguous
parameter is not Utf8 (#12913)
- Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (#12919)
- Patch
rolling_var
/rolling_std
numerical stability (#12909)
- Fix incorrect Int16
min
/max
due to incorrect SIMD mask construction (#12908)
- Improve handling of decimal conversion with
to_numpy
in the absence of pyarrow (#12888)
- Fix OOB error in list set operations on empty frame (#12845)
- Fix error message for uninstantiated
Enum
types (#12886)
- Fix repr of
Expr.gather
(which was still showing deprecated take) (#12864)
- Fix
Array
dtype equality (#12853)
- Fix
nan_min/max
incorrectly aggregating chunks with addition (#12848)
- Revert type hint change on expression inputs (#12792)
- More accurate type hinting for
collect_all
functions (#12796)
- Use total float ordering in is_in (#12800)
- Handle aggregation for all-NaN groups in
group_by
(#12304)
π οΈ Other improvements
- Update version switcher for
0.20
(#12844)
- Add upgrade guide for Python Polars 0.20 (#12872)
- Run doctests before other tests (#13047)
- Update
describe
calculation of min/max (#13027)
- Minor typo fix (#13003)
- Resolve two interchange tests failing locally (#12999)
- Update outdated links to API in Expressions/Functions page (#12981)
- Expand docstrings for
count
(#12960)
- Fix issue with docs for
group_by_dynamic
(#12906)
- Prefer explicit
--no-cov
flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889)
- Scheduled removal of previously deprecated functionality (#12885)
- Fix references in deprecation notes (#12877)
- Fix typo in
hash
docstring (#12879)
- Fix docstring for deprecated
list.take
(#12873)
- Note that
list.take
is deprecated (#12867)
- Fix failing tests (#12859)
- Add quotes to
pip install
with dependencies (#12799)
- Fix parameter name reference in
update
docstring #12797
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange