Skip to content

Conversation

@forfudan
Copy link
Collaborator

NuMojo v0.6 for MAX and mojo 25.1

forfudan and others added 20 commits January 28, 2025 16:14
…andint` (#199)

The `random` module was created at quite an early stage. A lot of things
should be re-considered. This PR aims to refactor the random module by:

1. Aligning the functional behaviors as much as possible with `numpy`
while keeping internal consistency of style: The `shape` always comes as
the first argument.
2. For all functions, accept `shape` as an `NDArrayShape`. Meanwhile, we
still provide overloads for `*shape: Int`. (`shape: List[Int]` is also
possible but not recommened).
3. `rand` now generates uniformed distributed values and does not accept
integral types.
4. `randint` is added to generate random integral values based on `low`
and `high`. add tests for it.
5. `random_exponential` is renamed as `exponential` (same as
numpy)[https://numpy.org/doc/stable/reference/random/generated/numpy.random.exponential.html#numpy.random.exponential].
…dtype` + unify functions (#200)

This PR makes some updates to the `statistics` module:

1. Add `returned_dtype` to several functions (`mean`, `median`) which
defaults to `f64`.
2. Add an overload of `mean` that calculates the average of all items
and returns a scalar. Remove the function `cummean`.
3. Add `variance` and `std` functions. Remove `cumpvariance` and
`cumpstd` functions (the formulae are not correct).
4. Incorporate the changes into the corresponding `NDArray`. Add
`variance` and `std` methods.
5. Fix the current tests and add tests for statistics module.
6. Add more detailed docstring for functions.
Implements `broadcast_to()` for `NDArray`. Add tests.

It can broadcast an ndarray of any shape to any compatible shape. The
data will be copied into the new array. An example goes as follows.

```mojo
from numojo.prelude import *
from python import Python
fn main() raises:
    var np = Python.import_module("numpy")
    var a = nm.random.rand(Shape(2, 3))
    print(a)
    print(nm.routines.manipulation.broadcast_to(a, Shape(2, 2, 3)))
    print(np.broadcast_to(a.to_numpy(), (2, 2, 3)))
```

```console
[[0.8073 0.5361 0.4442]
 [0.9378 0.1910 0.2421]]
2D-array  Shape(2,3)  Strides(3,1)  DType: f64  C-cont: True  F-cont: False  own data: True

[[[0.8073 0.5361 0.4442]
  [0.9378 0.1910 0.2421]]
 [[0.8073 0.5361 0.4442]
  [0.9378 0.1910 0.2421]]]
3D-array  Shape(2,2,3)  Strides(6,3,1)  DType: f64  C-cont: True  F-cont: False  own data: True

[[[0.8074 0.5361 0.4442]
  [0.9378 0.1911 0.2421]]
 [[0.8074 0.5361 0.4442]
  [0.9378 0.1911 0.2421]]]
```
…er docstrings (#205)

This PR aims to add all necessary boundary checks for `NDArrayShape` to
ensure a safe use.

- Add boundary checks for `ndim > 0` at initialization.
- Add boundary checks for `shape[i] > 0` at initialization.
- Add complete docstrings for all methods of `Shape` type, e.g.,
`raises`, `args`, `returns`.
…better docstrings (#206)

This PR aims to add all necessary boundary checks for `NDArrayStrides`
to ensure safe use.

- Add boundary checks for `ndim > 0` at initialization.
- Add complete docstrings for all methods of `Shape` type, e.g.,
`raises`, `args`, `returns`.
- Chain calling `__init__(shape: NDArrayShape, order: String)` for other
list-like shape argument.
- Fix `__eq__` method.
- Add new initialization method to create an uninitialized strides with
given length.
…207)

1. Allow calculating variance and std of an array by axis:
`numojo.statistics.variance()` and `numojo.statistics.std()`.
2. Add corresponding methods for `NDArray`.
3. Add auxiliary function `numojo.manipulation._broadcast_back_to()`.
4. Add tests.
5. Remove un-used imports.
This PR aims to improve the behaviors of 0-dimensional array (numojo
scalar). Note that `a.item(0)` or `a[Item(0)]` is always preferred
because the behavior is more determined, but we also allow some
***basic*** operations on 0darray to make users' life easier.

0-dimensional array cannot be constructed by users but can be obtained
by array indexing and slicing. Printing this variable gives the scalar
and a note that it is an 0darray instead of a mojo scalar. It is similar
to `numpy` in that `a[0]` returns a numpy scalar and `a.item(0)` returns
a Python scalar. For example,
```
>>> var a = nm.random.arange[f16](0, 3, 0.12)
>>> print(a[1])
0.11999512  (0darray[f16])
>>> print(a.item(1))
0.11999512
>>> var c = nm.array[f16]("[[1,2], [3,4]]")
>>> print(c[1, 1])
4.0  (0darray[f16])
```

0-dimensional array can be unpacked to get the corresponding mojo scalar
either by `[]` or by `item()`. For example,
```
>>> var a = nm.random.arange[f16](0, 3, 0.12)
>>> var b = a[1]
>>> print(b)
0.11999512  (0darray[f16])
>>> print(b[])  # Unpack using []
0.11999512
>>> print(b.item())  # Unpack using item()
0.11999512
```

0-dimensional array can be used in arithmetic operations, just like a
scalar.
```
>>> var a = nm.random.arange[f16](0, 3, 0.12)
>>> var b = a[1]
>>> var c = nm.array[f16]("[[1,2], [3,4]]")
>>> var d = c[1, 1]
>>> print(b - d)  # Arithmetic operations between two 0darrays
-3.8808594  (0darray[f16])
>>> print(b[] - d[])  # Arithmetic operations after unpacking
-3.8808594
>>> print(b < d)  # Comparison between 0darray and 0darray
True  (0darray[boolean])
>>>print(b == d[])  # Comparison between 0darray and unpacked 0darray
False  (0darray[boolean])
```
This PR:

- Updates the roadmap document according to our current progress.
- Remove the auto-generated `magic.lock` from the cache.
- Remove the `.readthedocs.yaml` from the cache.
- Update the toml file and update channels.
…ut (#210)

Adds the `Flags` type for storing information on memory layout. It
replaces the current `Dict[String, Bool]` type. The Flags object can
also be accessed dictionary-like. Short names are available for
convenience. It is similar to `numpy.flags` object. Example:

```mojo
fn main() raises:
    var A = nm.random.rand(2, 3, 4)
    print(A.flags.C_CONTIGUOUS)
    print(A.flags["C_CONTIGUOUS"])
    print(A.flags["C"])
```

They all print `True`.
Updates the code to accommodate to Mojo v25.1. The changes include but
are not limited to:

Change constructors.
`str(` -> `String(`
`int(` -> `Int(`
`float(` -> `Float64(`

Change `index()` to `Int()`.
`index(T)` -> `Int(T)`

The function `isdigit()` becomes a method.
`isdigit(a)` -> `a.isdigit()`

Use explicit constructor for complex ndarray.
```mojo
self._re = NDArray[dtype](shape, order)
self._im = NDArray[dtype](shape, order)
```
…ate array by any axis (#212)

Adds `NDAxisIter` struct and `iter_by_axis` method that iterate array by
any axis. In each iteration, the iterator yields a 1-d array by
specified axis. It is useful when we want to write a universal function
to reduce the array by certain axis.

Example:
```mojo
from numojo.prelude import *
var a = nm.arange[i8](24).reshape(Shape(2, 3, 4))
print(a)
for i in a.iter_by_axis(axis=0):
    print(String(i))
```

This prints:

```console
[[[ 0  1  2  3]
    [ 4  5  6  7]
    [ 8  9 10 11]]
    [[12 13 14 15]
    [16 17 18 19]
    [20 21 22 23]]]
3D-array  Shape(2,3,4)  Strides(12,4,1)  DType: i8  C-cont: True  F-cont: False  own data: True
[ 0 12]
[ 1 13]
[ 2 14]
[ 3 15]
[ 4 16]
[ 5 17]
[ 6 18]
[ 7 19]
[ 8 20]
[ 9 21]
[10 22]
[11 23]
```

Another example:

```mojo
from numojo.prelude import *
var a = nm.arange[i8](24).reshape(Shape(2, 3, 4))
print(a)
for i in a.iter_by_axis(axis=2):
    print(String(i))
```

This prints:

```console
[[[ 0  1  2  3]
    [ 4  5  6  7]
    [ 8  9 10 11]]
    [[12 13 14 15]
    [16 17 18 19]
    [20 21 22 23]]]
3D-array  Shape(2,3,4)  Strides(12,4,1)  DType: i8  C-cont: True  F-cont: False  own data: True
[0 1 2 3]
[4 5 6 7]
[ 8  9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]
```.
… to work on any axis (#213)

1. Adds functions that are able to apply any functions working on 1-d
array to any axis, with or without dimension reduction.
- Add `apply_func_on_array_with_dim_reduction()` and
`apply_func_on_array_without_dim_reduction()`. They try to utilize
parallelization as much as possible.
- In future, we only need to focus on writing (and optimizing) functions
for 1-d arrays. Operating along certain axis can be easily achieved by
applying the function, e.g.,
`apply_func_on_array_with_dim_reduction[max_1d](array, axis=axis).

3. Implements this approach on functions in statistics.averages module
and on the `sort` function. The `sort` function gain speed increase
compared to the old method and is quicker than `numpy.sort` for large
arrays.
…214)

## Changes

1. Refine `argsort` function by applying the universal function. Improve
the speed significantly (see below). Also, it fixes the problem that
`argsort` does not work for F-order array.
2. Improve the speed of `sort` for 1d-array by adding partition
functions which do not construct the indices array.
3. Update `_NDAxisIter` to allow order argument.
4. Re-write `ravel` function by means of `_NDAxisIter`, so that it will
not break for F-order arrays.
5. Add many tests for the functions, to allow C or F operations both C
and F arrays (4 different scenarios).
6. Add `FORC` attribute for the `Flags` type.

## Comparison

`argsort` numojo vs numpy:
```console
100000000 1-d array.
numojo 8.672953000001144
numpy 11.353579999995418

10_000 * 10_000 2-d array sorted by axis 0.
numojo 1.9524170000222512
numpy 4.66693300002953

10_000 * 10_000 2-d array sorted by axis 1.
numojo 0.5791429999517277
numpy 4.1895380000351
```
This PR changes the approach in determining the min and max values of
the printable regions of an array. This significantly improves the speed
of printing arrays.

This improvement is particularly significant when we encounter very
large arrays. The speed increase can be x100000. See the following
comparison on a (10000, 1000) array.

```console
# before the change
2D-array  Shape(10000,1000)  Strides(1000,1)  DType: f64  C-cont: True  F-cont: False  own data: True
Time to print array: 19.190531999978703

# after the change
2D-array  Shape(10000,1000)  Strides(1000,1)  DType: f64  C-cont: True  F-cont: False  own data: True
Time to print array: 0.0001010000123642385
```
…er and any axis + some optimization work (#216)

This PR updates the `numojo.math.extrema` module and performs some other
optimization work:

- Update `max()` and `min()` to allow both C and F order arrays and by
any axis.
- Unify all the overloads and function signatures. The `maxT()` and
`minT()` are removed.
- Update the `max()` and `min()` methods for `NDArray` type.
- Some other optimization work, including:
  - Use `//` and `%` to replace `divmod()` in all cases.
- Use `a.size` attribute to replace `a.num_elements()` method in all
cases.
- Remove unnecessary copy of memory in the `apply_func_over_axis`
functions.
- Increase the speed of `nditer` by not re-constructing the strides in
every loop.
#217)

This PR:

- Implements the `diagonal` function in `linalg.misc` module. Also
includes it in the NDArray as a method. Add tests for it.

- Fix the `NDArray.sort()` method (in-place sort). The default axis is
-1 rather than None. Add tests for it.
… module for functional programming (#218)

- Move private functions `_apply_func_on_array...()` from `utility`
module to a new, dedicated module `numojo.routines.functional` that is
used for functional programming purposes.
- Rename the functions `_apply_func_on_array...()` as
`apply_along_axis()`, making them public functions and can be used by
users. The function fulfills the same goal as `numpy.apply_along_axis()`
which executes a function working on 1-d arrays on the input n-d array
along the given axis.
- Rename `iter_by_axis()` as `iter_along_axis()` since meanings of these
two expressions are different. The former one will be reserved for
another purpose that will be implemented soon.
- Add unit tests for this function, e.g., C-order vs F-order, along axis
0, 1, and 2.
… + fix `__bool__()` (#219)

- Add `compress` function in indexing routine
(`numojo.routines.indexing`) which return selected slices of an array
along given axis or without the `axis` argument.
- Add the function as one of the ndarray methods.
- Enhance the `NDArrayIter` to allow iterating over any dimension.
- Add `ith()` method to `NDArrayIter` to get the i-th item.
- Fix the `NDArray.__bool__()` method which only returns a value if the
size of the array is 1.
- Add a number of tests for `compress()`.

Example:
```mojo
print(a)
print(nm.indexing.compress(nm.array[boolean]("[1, 1, 1]"), a, axis=1))
print(np.compress(np.array([1, 1, 1]), anp, axis=1))
```

```console
[[[ 0  6 12 18]
  [ 2  8 14 20]
  [ 4 10 16 22]]
 [[ 1  7 13 19]
  [ 3  9 15 21]
  [ 5 11 17 23]]]
3D-array  Shape(2,3,4)  Strides(1,2,6)  DType: i8  C-cont: False  F-cont: True  own data: True
[[[ 0  6 12 18]
  [ 2  8 14 20]
  [ 4 10 16 22]]
 [[ 1  7 13 19]
  [ 3  9 15 21]
  [ 5 11 17 23]]]
3D-array  Shape(2,3,4)  Strides(12,4,1)  DType: i8  C-cont: True  F-cont: False  own data: True
[[[ 0  6 12 18]
  [ 2  8 14 20]
  [ 4 10 16 22]]

 [[ 1  7 13 19]
  [ 3  9 15 21]
  [ 5 11 17 23]]]
```

---------

Co-authored-by: MadAlex1997 <[email protected]>
Implement `clip()` function for scalar a_min and a_max in `math.misc`
module. Add corresponding method in `NDArray`. Add tests.
…222)

1) This PR standardizes the Doctoring format according to [Mojo
docstring style
guide](https://github.com/modular/mojo/blob/main/stdlib/docs/docstring-style-guide.md)
(and to be aligned with numpy) which is as follows,
```
Description:
Parameters:
Args:
Returns:
Raises:
See Also:
Notes:
References
Examples:
```
2) Add more descriptive errors in the internal functions of NDArray to
give better understanding of the errors and also their source.

---------

Co-authored-by: ZHU Yuhao 朱宇浩 <[email protected]>
@forfudan
Copy link
Collaborator Author

@MadAlex1997 @shivasankarka, we could make a monthly release at the end of this month as v0.6 for MAX 25.1. This is a placeholder draft pull request. We can merge it when we #221, #201, and other PRs are merged into branch pre-0.6.

Good to see that there is no conflict with the main branch.

- Improve `_NDIter` to allow arbitrary axis to travel.
- Add method `ith()` to get the i-th item of the iterator.
- Add `swapaxes()` for shape and strides.
- Add `offset()` for `Item` type to get offset.
- Constructor for `Item` from index and shape.
- Add tests for C or F array with `nditer` from C or F orders.
@forfudan forfudan marked this pull request as ready for review February 28, 2025 08:38
@MadAlex1997 MadAlex1997 merged commit 23f316c into main Mar 1, 2025
2 checks passed
@MadAlex1997 MadAlex1997 deleted the pre-0.6 branch March 1, 2025 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants