β οΈ Deprecations
- Rename
threadpool_size
to thread_pool_size
(#14236)
π Performance improvements
- prune parquet row groups when
is_not_null
is used (#14260)
- Avoid unnecessary copies in
Series.to_numpy
for boolean/temporal types (#14261)
- use is_between to skip parquet row groups (#14244)
- Use a compression API that is designed for this use case (#11699) (#14194)
- Use
UnitVec
in polars-plan traversal (#14199)
- use
UnitVec
in streaming joins (#14197)
- improve
ChunkId
(#14175)
- improve iteration performance (#14126)
- elide unneeded work in window? (#14108)
- run window functions more in parallel (#14095)
- improve skip row group using statistics condition (#14056)
β¨ Enhancements
- add
u8
/i8
/u16
/i16
parsers to CSV reader (#14241)
- move
F-order
data in and out of numpy to polars zero copy (#14259)
- read arrow-c-interface without requiring pyarrow (#14254)
- Implements
list.gather_every
(#14253)
- Implements
prefix/suffix_fields
(#14251)
- Change
Series.to_numpy
to return f64
for Int32/UInt32
Series with nulls instead of f32
(#14240)
- Polish decimal arithmetic (#14172)
- improved
read_excel
format detection, and support for excel 97-2004 workbooks (#14234)
- Introduce
arr.to_struct
(#14202)
- Supports map fields name of struct (#14203)
- make
IdxVec
generic as UnitVec
(#14196)
- add new arithmetic kernels (#14026)
- Supports
unique
and hash_rows
for null
column (#14111)
- Implement arithmetic operations for
Null
columns (#14107)
- support pd.Index in from_pandas and elsewhere (#14087)
- Allow renaming expressions with keyword syntax in
group_by
(#14071)
- raise more informative error message if someone lands on Expr.__bool__ (#14067)
- Adapt extend_constant to function expr architecture and expressify it (#14058)
- add integer negation (#14049)
list
& array
measures of dispersion (#13245)
- gc binview when writing ipc (#14035)
- When calling
convert_time_zone
on time-zone-naive datetime, convert as if converting from UTC (#13960)
π Bug fixes
- deduplicate recursive growables (#14264)
- Fix
glimpse
overload signature (#14258)
- allow set operations on list of categoricals (#14110)
any/all_horizontal
with single input has incorrect type (#14256)
- load numpy array with np array values #14237 (#14238)
- Make
Series.to_numpy
on booleans without nulls return bool
type (#14239)
- fix ufunc in agg (change __ufunc_array__ so it uses
is_elementwise=True
parameter) (#14135)
- Fix join validation for String types (#14229)
- enable windows test coverage for
read_excel
"calamine" (fastexcel) engine (#14171)
- make csv parser more robust to edge cases (#14210)
- Fix for
set_operations
of binary dtype (#14152)
- fix read_csv date/datetime inference and parsing (#14113)
- don't see files as hive partitions (#14128)
- allow eval on list of categoricals (#14132)
- Forbid casting from
Date
to Time
and vice versa (#14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (#14120)
- Implements
gt/lt
cmp for null dtype (#14119)
- ignore comments at beginning of csv if schema provided (#14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot_table would do (#14048)
- multiple
read_excel
updates (#14039)
- some temporal conversion errors for datetimes earlier than
1970-01-01
(#14050)
- Preserve name when casting from categorical (#14085)
- respect
Object
dtype designation (#14072)
- fix cse bug when window function is nested (#14070)
- Fix
melt
panic when there are no value vars (#14057)
json_encode
should respect the logical type (#14063)
- improve skip row group using statistics condition (#14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (#13962)
- handle
SliceSink
with empty data (#14025)
- Allow
Series.to_pandas
for categorical types (#14028)
- correct field type schema inference (using read_csv) (#14042)
- Use int formatter for unsigned ints (#14043)
π Documentation
- fix code block in user-guide/lazy/schemas (#14228)
- Add visualization page to user guide (#13052)
- Fix typo in contributing guide (#14181)
- Small improvements Ecosystem page (#14176)
- fix code blocks in user-guide/concepts/data-structures (#14146)
- Document that Kleene logic is followed in
any_horizontal
and all_horizontal
(#14148)
- Fix description of
return_dtype
parameter for map_elements
and map_batches
(#14114)
- Fix bullet point formatting in CI contributing guide (#14117)
- Add documentation on replacement strings to
str.replace
and str.replace_all
(#13382)
- Replace alternatives page with more objective comparison (#13784)
- Note that only one
name
operation is allowed per expression (#14075)
- Improve deprecation message of
dtype_if_empty
param (#14068)
- fix more docstring bullet points (#14065)
π οΈ Other improvements
- Reorganize NumPy interop tests (#14257)
- additional dataframe test coverage (#14243)
- Remove
*args
in Series.to_numpy
(#14248)
- Move metadata utils to
meta
module (#14230)
- remove unused method DataFrame._from_dicts (#14212)
- make gather_chunked completely generic (#14195)
- Add
.cargo
directory to .gitignore (#14191)
take_chunked
to polars-ops (#14185)
- Issue a warning when running doctests on Python 3.11 or lower (#14187)
- Run
cargo update
(#14160)
- merge take kernels (#14137)
- improve From<Ca> -> Vec (#14123)
- hoist boolean -> string cast (#14122)
- remove unused argument (#14014)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Vincenthays, @Wainberg, @alexander-beedie, @apcamargo, @braaannigan, @c-peters, @deanm0000, @dependabot, @dependabot[bot], @dpinol, @edavisau, @eitsupi, @flisky, @grinya007, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @orlp, @r-brink, @reswqa, @ritchie46, @stinodego and @taki-mekhalfa