python - Remove multiple items from a numy.narray without numpy.delete -
I am using a large nump.narray (11.000x3180) to develop an active learning algorithm (text mining) . In this algorithm, I have to delete every Ireshian 16 samples (line vectors) in my dataset, and then have to integrate them into the training set (this increment increases in 16 samples). After performing this process for 60 iterations (approximately), the same process is started from the beginning of the algorithm repeatedly starting from 100
In my data set, set of 16 elements To delete, I use the method numpy.delete (dataset [listfoundx], axis = 0)
, where [ListifoIndex]
indexes of selected objects to be removed Corresponds to
This method works for the first time (1 out of 100), but again the algorithm starts again, I have the following error:
new = Empty (newshape, arr .dtype, arr.flags.fnc) Random
Obviously numpy.delete
metod creates a copy of my database for each index (16x1.2GB), which is more than the amount of memory on my computer.
The question is how can I get the item removed from a numpy.narray, not to use too much memory and without excessive execution time?
PD1: I have done the reverse process, where I add the elements that are not in the index list to remove but the process is very slow. PD2: Sometimes the error occurs before the start of the algorithm (before the recurrence number 60)
It can help you understand exactly what np.delete
does in your case
newset = np.delete (dataset, listfoundx, axis = 0) # Correct
In short it does:
keep = np.ones (dataset.shape [0], dtype = bool) #The truth of the mail Matching first index [ListifoIndex] = False newset = Dataset [keep:]
If I run to run
dataset = np.delete (dataset, listfoundx), then its boolean index builds in other words. , Axis = 0)
There is no accumulation of frequent, intermediate arrays in an interactive shell. This will temporarily be a new copy of the Are you sure that this is array and
while moving dataset
to keep it going . But with the assignment, the old copy disappears.
delete
rising memory usage, which is in opposition to increasing training set? For speed in terms of speed, you can improve that by actually retaining the 'mask' of all 'removed' rows instead of removing anything, how it depends on how ListifoIndex overlaps with the previous extinction, updating this mask can cause more trouble than it. It is also likely to have more error prone.
Comments
Post a Comment