π₯ Breaking changes
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshape
to return Array types instead of List types (#16825)
- Default to raising on out-of-bounds indices in all
get
/gather
operations (#16841)
- Native
selector
XOR set operation, guarantee consistent selector column-order (#16833)
- Set
infer_schema_length
as keyword-only argument in str.json_decode
(#16835)
- Update
set_sorted
to only accept a single column (#16800)
- Update
group_by
iteration and partition_by
to always return tuple keys (#16793)
- Default to
coalesce=False
in left outer join (#16769)
- Remove
pyxlsb
engine from read_database
(#16784)
- Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741)
- Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated
top_k
parameters nulls_last
, maintain_order
, and multithreaded
(#16599)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg in truncate
and round
(#16655)
- Change default
offset
in group_by_dynamic
from 'negative every
' to 'zero' (#16658)
- Constrain access to globals from
DataFrame.sql
in favor of top-level pl.sql
(#16598)
- Read 2D NumPy arrays as multidimensional
Array
instead of List
(#16710)
- Update
clip
to no longer propagate nulls in the given bounds (#14413)
- Change
str.to_datetime
to default to microsecond precision for format specifiers "%f"
and "%.f"
(#13597)
- Update resulting column names in
pivot
when pivoting by multiple values (#16439)
- Preserve nulls in
ewm_mean
, ewm_std
, and ewm_var
(#15503)
- Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
and DataFrame.write_json
(#16550)
- Update function signature of
nth
to allow positional input of indices, remove columns
parameter (#16510)
- Rename struct fields of
rle
output to len
/value
and update data type of len
field (#15249)
- Remove class variables from some DataTypes (#16524)
- Add
check_names
parameter to Series.equals
and default to False
(#16610)
β οΈ Deprecations
- Deprecate
LazyFrame.with_context
(#16860)
- Rename parameter
descending
to reverse
in top_k
methods (#16817)
- Rename
str.concat
to str.join
(#16790)
- Deprecate
arctan2d
(#16786)
π Performance improvements
- Optimize string/binary sort (#16871)
- Use
split_at
in split
(#16865)
- Use
split_at
instead of double slice in chunk splits. (#16856)
- Don't rechunk in
align_
if arrays are aligned (#16850)
- Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort
(#16808)
- Speed up
dt.offset_by
2x for constant durations (#16728)
- Toggle coalesce if non-coalesced key isn't projected (#16677)
- Make
dt.truncate
1.5x faster when every
is just a single duration (and not an expression) (#16666)
- Always prune unused columns in semi/anti join (#16665)
β¨ Enhancements
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csv
SQL table reading function defaults (better handle dates) (#16866)
- Support SQL
VALUES
clause and inline renaming of columns in CTE & derived table definitions (#16851)
- Support Python
Enum
values in lit
(#16858)
- convert to give time zone in
.str.to_datetime
when values are offset-aware (#16742)
- Update
reshape
to return Array types instead of List types (#16825)
- Default to raising for oob on all
get
/gather
operations (#16841)
- Support
SQL
"SELECT" with no tables, optimise registration of globals (#16836)
- Native
selector
XOR set operation, guarantee consistent selector column-order (#16833)
- Extend recognised
EXTRACT
and DATE_PART
SQL part abbreviations (#16767)
- Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Only accept a single column in
set_sorted
(#16800)
- Expose overflowing cast (#16805)
- Update group-by iteration to always return tuple keys (#16793)
- Support array arithmetic for equally sized shapes (#16791)
- Default to
coalesce=False
in left outer join (#16769)
- More removal of deprecated functionality (#16779)
- Removal of
read_database_uri
passthrough from read_database
(#16783)
- Remove
pyxlsb
engine from read_database
(#16784)
- Add
check_order
parameter to assert_series_equal
(#16778)
- Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv
(#16674)
- Streamline SQL
INTERVAL
handling and improve related error messages, update sqlparser-rs
lib (#16744)
- Support use of ordinal values in SQL
ORDER BY
clause (#16745)
- Support executing polars SQL against
pandas
and pyarrow
objects (#16746)
- Remove deprecated parameters in
Series.cut/qcut
(#16741)
- Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_range
to no longer produce datetime ranges (#16734)
- Mark
min_periods
as keyword-only for rolling
methods (#16738)
- Remove deprecated
top_k
parameters (#16599)
- Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LAST
ordering (#16711)
- Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVAL
strings (#16732)
- More scheduled removal of deprecated functionality (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg in truncate
and round
(#16655)
- Change default of
offset
in group_by_dynamic from "negative every
" to "zero" (#16658)
- Constrain access to globals from
df.sql
in favour of top-level pl.sql
(#16598)
- Read 2D numpy arrays as Array[dt, shape] instead of Liststdt
- Activate decimal by default (#16709)
- Do not propagate nulls in
clip
bounds (#14413)
- Change
.str.to_datetime
to default to microsecond precision for format specifiers "%f"
and "%.f"
(#13597)
- Remove redundant column name when pivoting by multiple values (#16439)
- Preserve nulls in
ewm_mean
, ewm_std
, and ewm_var
(#15503)
- Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime
(#16634)
- Support decimals by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
and DataFrame.write_json
(#16550)
- Update function signature of
nth
to allow positional input of indices, remove columns
parameter (#16510)
- Rename struct fields of
rle
output to len
/value
and update data type of len
field (#15249)
- Remove default class variable values on DataTypes (#16524)
- Add
check_names
parameter to Series.equals
and default to False
(#16610)
- Dedicated
SQLInterface
and SQLSyntax
errors (#16635)
- Add
DIV
function support to the SQL interface (#16678)
- Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
π Bug fixes
- Fix
should_rechunk
check (#16852)
- Ensure
read_excel
and read_ods
return identical frames across all engines when given empty spreadsheet tables (#16802)
- Consistent behaviour when "infer_schema_length=0" for
read_excel
(#16840)
- Standardised additional SQL interface errors (#16829)
- Ensure that splitted ChunkedArray also flattens chunks (#16837)
- Reduce needless panics in comparisons (#16831)
- Reset if next caller clones inner series (#16812)
- Raise on non-positive json schema inference (#16770)
- Rewrite implementation of
top_k/bottom_k
and fix a variety of bugs (#16804)
- Fix comparison of UInt64 with zero (#16799)
- Fix incorrect parquet statistics written for UInt64 values > Int64::MAX (#16766)
- Fix boolean distinct (#16765)
DATE_PART
SQL syntax/parsing, improve some error messages (#16761)
- Include
pl.
qualifier for inner dtypes in to_init_repr
(#16235)
- Column selection wasn't applied when reading CSV with no rows (#16739)
- Panic on empty df / null List(Categorical) (#16730)
- Only flush if operator can flush in streaming outer join (#16723)
- Raise unsupported cat array (#16717)
- Assert SQLInterfaceError is raised (#16713)
- Restrict casting for temporal data types (#14142)
- Handle nested categoricals in
assert_series_equal
when categorical_as_str=True
(#16700)
- Improve
read_database
check for SQLAlchemy async Session objects (#16680)
- Reduce scope of multi-threaded numpy conversion (#16686)
- Full null on dyn int (#16679)
- Fix filter shape on empty null (#16670)
π Documentation
- Update version switcher for 1.0.0 prereleases (#16847)
- Update link from Python API reference to user guide (#16849)
- Update docstring/test/etc usage of
select
and with_columns
to idiomatic form (#16801)
- Update versioning docs for 1.0.0 (#16757)
- Add docstring example for
DataFrame.limit
(#16753)
- Fix incorrect stated value of
include_nulls
in DataFrame.update
docstring (#16701)
- Update deprecation docs in the user guide (#14315)
- Add example for index count in
DataFrame.rolling
(#16600)
- Improve docstring of
Expr/Series.map_elements
(#16079)
- Add missing
polars.sql
docs entry and small docstring update (#16656)
π οΈ Other improvements
- Remove inner
Arc
from FileCacheEntry
(#16870)
- Do not update stable API reference on prerelease (#16846)
- Update links to API references (#16843)
- Prepare update of API reference URLs (#16816)
- Rename allow_overflow to wrap_numerical (#16807)
- Set
infer_schema_length
as keyword-only for str.json_decode
(#16835)
- Don't enter streaming engine for groupby-> agg mean/median β¦ (#16810)
- Improve safety of amortized_iter (#16820)
- Remove needless inner type clone (#16718)
- Fix incorrect debug assertion in
ChunkedArray::from_chunks_and_dtype
(#16697)
- Update version resolver for
1.0.0
release (#16705)
- Avoid AWS pinning to outdated crc32c version (#16681)
Thank you to all our contributors for making this release possible!
@JulianCologne, @KDruzhkin, @MarcoGorelli, @Object905, @alexander-beedie, @bertiewooster, @coastalwhite, @datenzauberai, @dependabot, @dependabot[bot], @henryharbeck, @marenwestermann, @mcrumiller, @montanarograziano, @nameexhaustion, @orlp, @ritchie46, @siddharth-gulia, @stinodego, @universalmind303 and @wence-