Polars: py-0.20.7 Release

Release date:
February 4, 2024
Previous version:
py-0.20.6 (released January 26, 2024)
Magnitude:
5,979 Diff Delta
Contributors:
24 total committers
Data confidence:
Commits:

111 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored January 31, 2024
Authored February 3, 2024
Authored February 3, 2024
Authored January 28, 2024
Authored January 31, 2024
Authored February 1, 2024

Top Contributors in py-0.20.7

ritchie46
stinodego
reswqa
grinya007
ion-elgreco
MarcoGorelli
taki-mekhalfa
Wainberg
mcrumiller
liam-brannigan-5bfe

Directory Browser for py-0.20.7

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

⚠️ Deprecations

  • Rename threadpool_size to thread_pool_size (#14236)

πŸš€ Performance improvements

  • prune parquet row groups when is_not_null is used (#14260)
  • Avoid unnecessary copies in Series.to_numpy for boolean/temporal types (#14261)
  • use is_between to skip parquet row groups (#14244)
  • Use a compression API that is designed for this use case (#11699) (#14194)
  • Use UnitVec in polars-plan traversal (#14199)
  • use UnitVec in streaming joins (#14197)
  • improve ChunkId (#14175)
  • improve iteration performance (#14126)
  • elide unneeded work in window? (#14108)
  • run window functions more in parallel (#14095)
  • improve skip row group using statistics condition (#14056)

✨ Enhancements

  • add u8/i8/u16/i16 parsers to CSV reader (#14241)
  • move F-order data in and out of numpy to polars zero copy (#14259)
  • read arrow-c-interface without requiring pyarrow (#14254)
  • Implements list.gather_every (#14253)
  • Implements prefix/suffix_fields (#14251)
  • Change Series.to_numpy to return f64 for Int32/UInt32 Series with nulls instead of f32 (#14240)
  • Polish decimal arithmetic (#14172)
  • improved read_excel format detection, and support for excel 97-2004 workbooks (#14234)
  • Introduce arr.to_struct (#14202)
  • Supports map fields name of struct (#14203)
  • make IdxVec generic as UnitVec (#14196)
  • add new arithmetic kernels (#14026)
  • Supports unique and hash_rows for null column (#14111)
  • Implement arithmetic operations for Null columns (#14107)
  • support pd.Index in from_pandas and elsewhere (#14087)
  • Allow renaming expressions with keyword syntax in group_by (#14071)
  • raise more informative error message if someone lands on Expr.__bool__ (#14067)
  • Adapt extend_constant to function expr architecture and expressify it (#14058)
  • add integer negation (#14049)
  • list & array measures of dispersion (#13245)
  • gc binview when writing ipc (#14035)
  • When calling convert_time_zone on time-zone-naive datetime, convert as if converting from UTC (#13960)

🐞 Bug fixes

  • deduplicate recursive growables (#14264)
  • Fix glimpse overload signature (#14258)
  • allow set operations on list of categoricals (#14110)
  • any/all_horizontal with single input has incorrect type (#14256)
  • load numpy array with np array values #14237 (#14238)
  • Make Series.to_numpy on booleans without nulls return bool type (#14239)
  • fix ufunc in agg (change __ufunc_array__ so it uses is_elementwise=True parameter) (#14135)
  • Fix join validation for String types (#14229)
  • enable windows test coverage for read_excel "calamine" (fastexcel) engine (#14171)
  • make csv parser more robust to edge cases (#14210)
  • Fix for set_operations of binary dtype (#14152)
  • fix read_csv date/datetime inference and parsing (#14113)
  • don't see files as hive partitions (#14128)
  • allow eval on list of categoricals (#14132)
  • Forbid casting from Date to Time and vice versa (#14127)
  • preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (#14120)
  • Implements gt/lt cmp for null dtype (#14119)
  • ignore comments at beginning of csv if schema provided (#14115)
  • fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot_table would do (#14048)
  • multiple read_excel updates (#14039)
  • some temporal conversion errors for datetimes earlier than 1970-01-01 (#14050)
  • Preserve name when casting from categorical (#14085)
  • respect Object dtype designation (#14072)
  • fix cse bug when window function is nested (#14070)
  • Fix melt panic when there are no value vars (#14057)
  • json_encode should respect the logical type (#14063)
  • improve skip row group using statistics condition (#14056)
  • Raise for .dt.epoch and .dt.timestamp for Duration dtype (#13962)
  • handle SliceSink with empty data (#14025)
  • Allow Series.to_pandas for categorical types (#14028)
  • correct field type schema inference (using read_csv) (#14042)
  • Use int formatter for unsigned ints (#14043)

πŸ“– Documentation

  • fix code block in user-guide/lazy/schemas (#14228)
  • Add visualization page to user guide (#13052)
  • Fix typo in contributing guide (#14181)
  • Small improvements Ecosystem page (#14176)
  • fix code blocks in user-guide/concepts/data-structures (#14146)
  • Document that Kleene logic is followed in any_horizontal and all_horizontal (#14148)
  • Fix description of return_dtype parameter for map_elements and map_batches (#14114)
  • Fix bullet point formatting in CI contributing guide (#14117)
  • Add documentation on replacement strings to str.replace and str.replace_all (#13382)
  • Replace alternatives page with more objective comparison (#13784)
  • Note that only one name operation is allowed per expression (#14075)
  • Improve deprecation message of dtype_if_empty param (#14068)
  • fix more docstring bullet points (#14065)

πŸ› οΈ Other improvements

  • Reorganize NumPy interop tests (#14257)
  • additional dataframe test coverage (#14243)
  • Remove *args in Series.to_numpy (#14248)
  • Move metadata utils to meta module (#14230)
  • remove unused method DataFrame._from_dicts (#14212)
  • make gather_chunked completely generic (#14195)
  • Add .cargo directory to .gitignore (#14191)
  • take_chunked to polars-ops (#14185)
  • Issue a warning when running doctests on Python 3.11 or lower (#14187)
  • Run cargo update (#14160)
  • merge take kernels (#14137)
  • improve From<Ca> -> Vec (#14123)
  • hoist boolean -> string cast (#14122)
  • remove unused argument (#14014)

Thank you to all our contributors for making this release possible! @JulianCologne, @MarcoGorelli, @Vincenthays, @Wainberg, @alexander-beedie, @apcamargo, @braaannigan, @c-peters, @deanm0000, @dependabot, @dependabot[bot], @dpinol, @edavisau, @eitsupi, @flisky, @grinya007, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @orlp, @r-brink, @reswqa, @ritchie46, @stinodego and @taki-mekhalfa